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INTRODUCTION: 


In  epithelial  ovarian  cancer  (EOC),  tumor-infiltrating  CD8+  T  cells  are  strongly 
associated  with  increased  progression-free  and  overall  survival  following  chemotherapy. 
However,  the  antigens  recognized  by  tumor-infiltrating  T  cells  remain  largely  unknown. 
Furthermore,  it  is  not  clear  how  tumor-infiltrating  T  cells,  having  failed  to  prevent  the  primary 
tumor,  can  oppose  tumor  recurrence.  Recent  work  has  shown  that  chemotherapy,  in  the  context 
of  impaired  DNA  repair  pathways,  induces  mutations  in  the  cancer  genome,  some  of  which 
contribute  to  chemo-resistance.  Chemotherapy-induced  mutations  may  provide  a  new  source  of 
tumor  antigens  for  T  cells,  since  mutated  or  aberrantly  expressed  proteins  should  be  perceived 
as  “non-self”.  Based  on  these  considerations,  we  hypothesize  that  the  mutational  effects  of 
chemotherapy  generate  new  tumor  antigens  that  trigger  a  second  wave  of  CD8+  T  cell 
responses,  which  in  turn  promote  favorable  clinical  outcomes.  To  test  this  hypothesis,  we  are 
using  serological  and  genomic  methods  to  identify  the  evolving  repertoire  of  tumor  antigens  and 
T  cell  responses  in  EOC  patients  who  demonstrate  favorable  clinical  outcomes. 

The  study  has  five  tasks: 

Task  1.  Collection  and  processing  of  biospecimens 

Task  2.  To  determine  whether  chemotherapy  induces  the  emergence  of  new  tumour- 
associated  CD8+  T  cell  clones  in  EOC. 

Task  3.  To  identify  by  serological  approaches  tumor  antigens  induced  by  chemotherapy 
in  EOC. 

Task  4.  To  identify  changes  to  the  tumor  transcriptome  induced  by  chemotherapy  in 
EOC. 

Task  5.  To  determine  whether  tumor-associated  CD8+  T  cells  in  EOC  recognize  putative 
chemotherapy-induced  antigens. 


Significance.  This  will  be  the  first  study  to  test  the  hypothesis  that  the  mutational  effects  of 
platinum/taxane-based  chemotherapy  generate  novel  antigens  that  stimulate  host  CD8+  T  cell 
responses.  With  a  better  understanding  of  how  T  cell  responses  evolve  during  standard 
treatments,  we  believe  it  will  be  possible  in  future  to  prolong  progression-free  survival  by 
enhancing  the  host  T  cell  response  using  immunodulatory  agents,  vaccines  or  adoptively 
transferred  T  cells. 


BODY: 


Task  1.  Collection  and  processing  of  biospecimens. 

This  task  concerns  the  collection  of  matched  primary  and  recurrent  ascites  from  6 
patients  with  high-grade  serous  ovarian  cancer;  isolation  and  storage  of  CD45+  leukocyte, 
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CD8+  lymphocyte  and  CD45-  tumor  cell  subtractions;  and  collection  of  blood  samples  before, 
during  and  after  primary  surgery  and  chemotherapy. 

Progress  to  date:  As  anticipated  in  the  original  proposal,  we  are  currently  focusing  on 
specimen  collection.  To  date,  we  have  collected  3  matched  primary  and  recurrent  ascites  tumor 
specimens.  However,  the  main  challenge  is  to  collect  these  from  women  with  a  favorable 
progression-free  interval  (PFI)  (ideally  >  24  months).  Currently,  we  have  7  patients  who  have 
donated  primary  tumor  specimens  already  and  who  are  approaching  our  PFI  criterion.  We  are 
making  every  effort  to  monitor  these  patients  and  collect  recurrent  ascites  specimens  as  they 
become  available. 

Task  2.  To  determine  whether  chemotherapy  induces  the  emergence  of  new  tumour- 
associated  CD8+  T  cell  clones  in  EOC. 


We  proposed  to  study  tumor-infiltrating  T  cells  from  primary  and  recurrent  ascites  to  look 
for  changes  in  T  cell  subsets  (CD3,  CD4,  CD8),  and  various  activation  and  differentiation 
markers.  We  also  proposed  to  sequence  T  cell  receptors  (TCRs)  to  identify  10-20  predominant 
T  cell  clones  from  recurrent  tumors.  Serial  blood  and  ascites  samples  will  then  be  analyzed  by 
QPCR  with  clonotype-specific  primers  to  determine  the  time  of  emergence  of  these  T  cell 
clones.  Our  hypothesis  predicts  that  a  large  proportion  of  CD8+  T  cell  clones  present  in 
recurrent  ascites  will  have  arisen  during  or  after  chemotherapy. 

Progress  to  date: 

We  have  started  experiments  for  Task  2  using  specimens  from  a  patient  (IROC024)  who 
fell  a  few  months  short  of  our  PFI  criterion,  but  is  nonetheless  of  great  interest  for  several 
reasons:  (a)  we  have  collected  primary  ascites  as  well  as  matched  ascites  from  her  first,  second 
and  third  recurrence;  (b)  there  are  ample  vials  of  tumor  cells  available  from  each  time  point,  so 
we  can  use  these  specimens  to  develop  our  methodology  without  fear  of  squandering  precious 
samples;  and  (c)  a  full  complement  of  serial  blood  samples  is  available  for  later  experiments  to 
track  T  cell  clones  overtime.  Using  these  specimens,  we  have  successfully  separated  tumor 
cells  and  T  cells  by  flow  cytometry,  and  these  samples  have  now  been  sent  to  Rob  Holt’s  lab  for 
RNA  isolation,  whole  transcriptome  sequencing,  and  TCR  sequencing.  Figure  1  summarizes 
the  clinical  history  and  available  specimens  for  IROC024. 
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Figure  1.  Clinical  history  and  available  specimens  for  IROC024,  a  patient  with 
high-grade  serous  ovarian  cancer.  Tumor  burden  is  indicated  by  CA125 
measurements  (Y  axis)  over  time  (X  axis).  Also  shown  are  the  time  points  at 
which  ascites  cells  (yellow  squares)  and  peripheral  blood  mononuclear  cells 
(PMBC;  green  squares)  were  collected.  Chemotherapy  protocols  are  shown 
(GOOVCATX  =  carboplatin  and  paclitaxel;  GOOVCAG  =  carboplatin  and 
gemcitibine;  GOOVLDOX  =  pegylated  liposomal  doxorubicin).  Note  that  ascites 
cells  (containing  tumor)  were  successfully  collected  at  primary  surgery,  first 
recurrence,  and  second  recurrence.  The  patient  only  survived  for  548  days  (1 .5 
years). 

As  part  of  our  flow  cytometry  experiments,  we  have  also  identified  a  subpopulation  of 
tumor-infiltrating  CD8+  T  cells  that  express  the  activation  marker  CD103.  We  hypothesize  that 
the  CD103+  subset  is  enriched  for  tumor-reactive  T  cell  clones.  For  details,  please  see  Figure  1 
of  Webb  et  al.,  2010  (Appendix  A). 

In  addition,  we  have  developed  and  published  new  methods  to  sequence  the  TCR 
repertoire  from  virtually  any  human  specimen  using  next-generation  sequencing  technology 
(see  manuscripts  by  Warren  et  al.  2009  and  Freeman  et  al.  2009,  Appendices  B  and  C). 
Indeed,  we  were  the  first  group  to  apply  this  sequencing  technology  to  the  human  T  cell 
repertoire  and,  in  one  experiment,  we  surpassed  the  number  of  human  TCR  sequences 
identified  by  all  other  labs  to  that  point  in  history. 

Finally,  we  performed  some  very  basic  analysis  of  the  effects  of  chemotherapy  on  T  cells 
in  EOC  patients.  To  our  surprise,  we  found  that  the  absolute  lymphocyte  count  (ALC)  changes 
very  little  during  chemotherapy  in  most  patients.  Instead,  patients  presented  with  high,  medium 
or  low  ALC  counts,  and  these  levels  persisted  through  treatment.  Remarkably,  we  noted  that 
patients  with  low  ALC  values  had  decreased  survival  rates  (Figure  2  below).  We  are  currently 
investigating  whether  high  ALC  values  correlate  with  tumor-infiltrating  CD8+  T  cells,  which  might 
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offer  an  explanation  for  this  unexpected  finding.  These  results  are  being  prepared  for  publication 
(Milne  et  al.,  in  preparation). 


Figure  2.  Absolute  lymphocyte  count  (ALC)  is  associated  with  survival  in  ovarian 
cancer.  Kaplan-Meier  analysis  of  overall  survival  for  44  patients  with  suboptimally 
debulked  high-grade  serous  ovarian  cancer.  Patients  were  stratified  into  upper 
(High)  and  lower  (Low)  quartiles  based  on  average  ALC  values  recorded  during 
primary  chemotherapy.  Note  that  patients  with  low  ALC  have  significantly 
decreased  survival. 

Task  3.  To  identify  by  serological  approaches  tumor  antigens  induced  by  chemotherapy 
in  EOC. 


We  proposed  to  construct  autologous  cDNA  libraries  in  a  yeast-based  expression 
system  using  mRNA  from  recurrent  ascites  tumor  cells.  Libraries  will  then  be  screened  with 
patient  sera  to  identify  tumor  antigens  recognized  by  IgG  autoantibodies.  For  each  antigen,  we 
will  determine  whether  expression  is  seen  in  the  primary  and/or  recurrent  tumor  tissue  (by 
QPCR);  whether  the  antigen  shows  sequence  alterations;  and  the  clinical  time  point  at  which 
antibody  responses  develop.  The  overarching  goal  is  to  identify  new  antigens  that  arise  during 
or  after  chemotherapy. 

Progress  to  date:  Like  Task  1,  this  aim  depends  on  the  collection  of  matched  primary 
and  recurrent  ascites  tumor  specimens  from  patients  experiencing  a  favorable  PFI.  Therefore, 
we  are  still  in  the  anticipated  waiting  period.  In  the  meantime,  however,  we  have  used 
serological  methods  to  analyze  changes  in  tumor-specific  antibodies  as  patients  go  through 
standard  treatment.  To  this  end,  we  assembled  a  cohort  of  23  EOC  patients  from  whom  we 
have  serial  blood  samples  spanning  from  pre-treatment,  through  to  at  least  one  year  post¬ 
treatment.  Initially,  we  are  studying  antibody  responses  using  the  simple  but  effective  method  of 
Western  blotting,  as  per  our  prior  study  in  prostate  cancer  (Nesslinger  2007).  Remarkably,  43% 
of  patients  (10/23)  showed  the  development  of  new  antibody  responses  to  tumor-associated 
antigens  within  3-12  months  of  standard  treatment  (Figure  3). 
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Figure  3.  Representative  immunoblots  showing  the  development  of 
autoantibodies  to  tumor-associated  antigens  in  ovarian  cancer.  Serial  serum 
samples  were  collected  at  the  indicated  time  points  from  two  patients  (A, 

IROC045  and  B,  IROC002)  with  high-grade  serous  ovarian  cancer.  Serum  was 
diluted  1:200  and  probed  against  lysate  from  the  human  ovarian  cancer  cell  line 
OVCAR3,  followed  by  anti-human  secondary  antibody  and  standard  detection  by 
enhanced  chemiluminescence.  Arrows  denote  the  locations  of  immunoreactive 
bands  that  arise  after  standard  treatment  (surgery  and  chemotherapy).  These 
bands  represent  antigens  against  which  the  patients  developed  serum 
autoantibodies. 

The  above  results  are  consistent  with  our  hypothesis  that  chemotherapy  triggers 
immune  responses  against  tumors.  We  are  currently  attempting  to  clone  the  antigens  underlying 
these  antibody  responses  using  SEREX  methodology,  which  is  similar  to  that  proposed  in  the 
grant. 

Task  4.  To  identify  changes  to  the  tumor  transcriptome  induced  by  chemotherapy  in 

EOC. 


To  identify  additional  candidate  tumor  antigens,  we  proposed  to  subject  primary  and 
recurrent  tumor  cells  to  whole-transcriptome  cDNA  sequencing  using  a  massively  parallel 
sequencing  platform  (lllumina).  Data  will  then  be  analyzed  to  identify  sequence  alterations  or 
changes  in  expression  level  that  differ  between  primary  and  recurrent  tumor  tissue  using 
constitutional  DNA  from  matched  PBMC  as  a  reference.  We  expect  to  identify  sequence 
alterations  arising  in  recurrent  tumor  cells,  including  point  mutations,  deletions  and 
rearrangements.  Potential  CD8+  T  cell  epitopes  will  be  identified  using  computer  algorithms 
predictive  of  MHC  binding. 

Progress  to  date:  Like  Tasks  1  and  3,  this  aim  depends  on  the  collection  of  matched 
primary  and  recurrent  ascites  tumor  specimens  from  patients  experiencing  a  favorable  PFI. 
Therefore,  we  are  still  in  the  anticipated  waiting  period.  In  the  meantime,  however,  we  have 
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improved  our  whole-transcriptome  cDNA  sequencing  method.  In  the  original  proposal  we 
indicated  we  would  generate  approximately  2  billion  bp  from  each  of  the  12  paired  EOC 
samples.  Now,  due  to  improvements  in  read  length  and  cluster  density  we  have  implemented  at 
the  Genome  Science  Centre  we  have  the  ability  to  generate  20  billion  bp  per  library  on  the 
lllumina  platform,  at  no  additional  cost.  We  will  be  able  to  leverage  this  increased  sequencing 
depth  to  improve  our  confidence  in  mutation  identification,  and  to  find  mutations  in  genes  with 
lower  expression  levels  than  was  initially  possible.  Further,  new  methods  have  recently  become 
available  to  reliably  capture  and  sequence  only  coding  portions  of  the  genome  (ie.  the  “Exome”), 
and  we  are  currently  validating  these  methods  at  our  centre.  We  will  explore  the  possibility  of 
using  these  two  orthogonal  approaches,  WTSS  and  Exome  sequencing,  to  increase  further  the 
acuity  of  mutation  detection,  such  that  we  identify  the  best  candidate  epitopes  for  functional 
evaluation  in  the  subsequent  aim.  As  described  above  for  Task  1,  we  are  in  the  process  of 
applying  these  improved  sequencing  methods  to  our  first  EOC  tumor  specimen,  from  IROC024. 

Task  5.  To  determine  whether  tumor-associated  CD8+  T  cells  in  EOC  recognize  putative 
chemotherapy-induced  antigens. 

We  proposed  to  use  Interferon-y  ELISPOT  to  test  candidate  antigens  from  Tasks  2  and  3 
for  recognition  by  CD8+  T  cells  from  recurrent  ascites.  RNA-transfected  or  peptide-pulsed 
CD40L-stimulated  B  cells  will  serve  as  antigen  presenting  cells.  The  clinical  time  point  at  which 
antigen-specific  T  cell  responses  arise  will  be  assessed  by  ELISPOT  analysis  of  serial  blood 
samples  and  primary  vs.  recurrent  ascites  samples. 

Progress  to  date:  This  aim  depends  on  the  results  of  Tasks  2  and  3,  therefore  it  is  not 
expected  to  commence  for  1-2  more  years.  Nonetheless,  we  have  been  developing  our 
ELISPOT  platform  in  preparation  for  this  work.  Specifically,  we  have  optimized  methods  for 
performing  in  vitro  stimulation  and  expansion  of  antigen-specific  human  CD8+  T  cells.  For  this 
purpose,  we  have  used  the  melanoma  tumor  antigen  MART-1,  which  is  often  used  as  a  model 
antigen  for  tumor  immunology  studies.  Our  methods  are  optimized  to  the  point  where  we  can 
activate  and  expand  naive  MART-1 -specific  CD8+  T  cells  reliably  from  normal  donors.  Thus,  we 
are  now  prepared  to  analyze  novel  tumor  antigens  as  they  arise  from  Tasks  2  and  3. 

While  ELISPOT  is  a  rapid  and  sensitive  method  to  quantify  T  cell  responses,  it  gives 
only  a  partial  assessment  of  the  phenotype  of  the  T  cell  responses.  A  better  index  may  be  the 
so-called  “polyfunctional”  status  of  T  cells,  wherein  the  cells  capacity  to  produce  a  spectrum  of 
different  cytokines  is  measured.  Therefore,  we  have  optimized  the  use  of  multi-parameter  flow 
cytometry  to  assess  T  cell  polyfunctionality  in  EOC.  Using  this  method,  we  have  found  that  T 
cell  polyfunctionality  in  response  to  polyclonal  stimulation  (anti-CD3)  is  suppressed  in  the  tumor 
environment  of  most  (but  not  all)  EOC  patients,  and  that  this  suppression  can  be  reversed  by 
the  addition  of  the  stimulatory  cytokines  IL-2,  IL-12  and  IL-18  (for  example,  see  Fig.  3  of  Tran 
et  al.,  Appendix  D). 

KEY  RESEARCH  ACCOMPLISHMENTS: 

•  Specimen  collection  is  proceeding  as  planned,  and  7  patients  are  currently  candidates  to 
provide  the  necessary  materials  for  this  study. 

•  We  optimized  flow  cytometric  methods  for  isolating  tumor  cells  and  T  cells  for  massively 
parallel  sequencing.  We  have  also  acquired  evidence  that  the  CD1 03+  subset  of  CD8+  T 
cells  is  enriched  for  tumor-reactive  clones  (Webb  et  al.,  2010,  Appendix  A). 
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•  We  optimized  methods  for  whole  transcriptome  shotgun  sequencing,  and  the  first  patient 
sample  is  now  being  analyzed  (Figure  1). 

•  We  found  an  association  between  absolute  lymphocyte  counts  (ALC)  and  survival  in  EOC 
(Figure  2)  (Milne  et  al.,  in  preparation). 

•  We  developed  an  innovative  method  to  analyze  the  human  T  cell  repertoire  using  massively 
parallel  sequencing  (Warren  et  al.  2009,  Freeman  et  al.  2009,  Appendices  B  and  C). 

•  We  showed  that  over  40%  of  EOC  patients  develop  autoantibody  responses  to  tumor- 
associated  antigens  during  standard  treatment  (Figure  3). 

•  We  optimized  methods  for  in  vitro  expansion  of  human  CD8+  T  cells  for  ELISPOT. 

•  We  developed  methods  for  analyzing  the  functional  status  of  CD8+  T  cells  by  multi¬ 
parameter  flow  cytometry  (Tran  et  al.,  submitted,  Appendix  D). 

REPORTABLE  OUTCOMES: 

Published  manuscripts: 

1.  Nelson  BH.  IDO  and  outcomes  in  ovarian  cancer.  Gynecol  Oncol.  2009.  Nov;  1 15(2):  179- 
80.  PubMed  PMID:  19822256. 

2.  Milne  K,  Kobel  M,  KallogerSE,  Barnes  RO,  Gao  D,  Gilks  CB,  Watson  PH,  Nelson  BH. 
Systematic  analysis  of  immune  infiltrates  in  high-grade  serous  ovarian  cancer  reveals 
CD20,  FoxP3  and  TIA-1  as  positive  prognostic  factors.  PLoS  One.  2009  Jul 
29;4(7):e6412.  PubMed  PMID:  19641607. 

3.  Freeman  JD,  Warren  RL,  Webb  JR,  Nelson  BH,  Holt  RA.  Profiling  the  T-cell  receptor 
beta-chain  repertoire  by  massively  parallel  sequencing.  Genome  Res.  2009 
Oct;19(10):1817-24.  Epub  2009  Jun  18.  PubMed  PMID:  19541912. 

4.  Warren  RL,  Nelson  BH,  Holt  RA.  Profiling  model  T-cell  metagenomes  with  short  reads. 
Bioinformatics.  2009  Feb  15;25(4):458-64.  Epub  2009  Jan  9.  PubMed  PMID:  19136549. 

5.  Warren  RL  and  Holt  RA.  A  census  of  predicted  mutational  epitopes  for  immunological 
cancer  control.  Hum.  Immunol.  2010  Mar;  71(3):245-54. 

6.  Webb,  J.R.,  Wick,  D.A.  Tran,  E.  Nielsen,  J.S.,  Milne,  K.,  McMurtrie,  E.  and  Nelson  B.H. 
2010.  Expression  of  the  intraepithelial  lymphocyte  marker  aE/p7  Integrin  (CD103)  is 
associated  with  tumor-reactive  CD8+  T  cells  in  ovarian  cancer  malignant  ascites. 
Gynecol  Oncol.  2010  Sep;1 18(3):228-36.  Epub  2010  Jun  11.  PubMed  PMID:  20541243. 

Submitted  manuscripts: 

1.  Tran,  E.,  Nielsen,  J.S.,  Wick,  D.A.,  Ng,  A.V.,  Nesslinger,  N.J.,  McMurtrie,  E.,  Webb,  J.R., 
Nelson,  B.H.  2010.  Polyfunctional  T-cell  responses  are  disrupted  by  the  ovarian  cancer 
ascites  environment  and  only  partially  restored  by  clinically  relevant  cytokines. 

Submitted. 
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Leveraged  funding: 

Since  this  grant  from  the  DOD  was  awarded,  we  have  received  funding  for  the  following  related 
projects: 

1.  Canadian  Institutes  of  Health  Research  (CIHR)  -  Grant#:MOP97897 

10/2009-09/2012  .  .  . 

Tumor-infiltrating  T  cells  in  ovarian  cancer:  functional  impact  on  patient  survival 
Goal:  To  define  the  functional  phenotype  of  tumor-infiltrating  T  cells  in  ovarian  cancer. 

PI:  Brad  Nelson 

Co-PI’s:  John  Webb,  Peter  Watson,  Julian  Lum 

2.  Canadian  Institutes  of  Health  Research  (CIHR)  -  Grant#:  CSB94217 

04/2009-03/2010  ‘ 

3.  The  Ovarian  Cancer  Immune  Epitope  Database 

Goal:  To  use  proteomic  methods  to  identify  T  cell  epitopes  for  the  immunotherapy  of  ovarian 
cancer. 

PI:  John  Webb 

Co-PI’s:  Cristoph  Borchers,  Brad  Nelson,  Julian  Lum 

4.  Canadian  Institutes  of  Health  Research  (CIHR)  -  Grant#:  MOP-102679 
04/2010-03/2014 

Characterizing  the  human  T-cell  receptor  repertoire  by  massively  parallel  sequencing 
Goal:  To  characterize  individual  variation  in  the  human  T-cell  repertoires  at  sequence  level 
resolution,  using  a  comparative  approach. 

PI:  Rob  Holt 
Co-PI:  John  Webb 

CONCLUSION: 

Overall,  this  study  is  progressing  on  schedule  and  on  budget,  with  no  significant  deviations  from 
the  original  proposal.  We  are  going  through  the  anticipated  waiting  period  associated  with  the 
collection  of  recurrent  tumor  specimens  from  patients  experiencing  a  favorable  progression-free 
interval.  In  the  meantime,  we  have  optimized  methods  for  cell  sorting  of  tumor  cells,  whole 
transcriptome  shotgun  sequencing,  and  in  vitro  expansion  of  human  T  cells  for  ELISPOT.  We 
have  published  5  relevant  manuscripts  in  2009,  and  two  are  under  review.  Additional, 
complementary  funding  has  been  received  or  requested  from  several  other  agencies,  enhancing 
the  strength  of  our  ovarian  cancer  research  program. 
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Introduction.  Tumor-infiltrating  CD8+  T  cells  are  strongly  associated  with  survival  in  high-grade  serous 
ovarian  cancer,  but  their  functional  phenotype  remains  poorly  defined.  The  mucosal  integrin  CD103  (aE/p7) 
facilitates  the  infiltration  of  T  cells  into  epithelial  tissues,  including  gut  and  lung  mucosa,  solid  organ 
allografts,  and  various  epithelial  cancers.  We  reasoned  that  CD103  might  also  be  expressed  by  tumor- 
reactive  T  cells  in  ovarian  cancer. 

Methods.  Flow  cytometry  was  used  to  assess  the  frequency  and  phenotype  of  CD1 03-expressing  T  cells  in 
primary  ascites  fluid  from  13  patients  with  high-grade  serous  ovarian  cancer  and  2  patients  with  recurrent 
disease. 

Results.  We  report  that  a  subset  of  patients  with  advanced  serous  ovarian  cancer  have  profoundly 
elevated  frequencies  of  CD1 03-expressing  CD8+  cells  in  ascites  (between  20%  and  70%  of  CD8+  cells  in 
ascites  were  CD103+)  and  that  CD103  expression  correlated  with  levels  of  TGF-p  in  ascitic  fluid.  Conversely, 
CD103  was  not  expressed  on  CD4+  cells,  even  in  those  patients  with  very  high  frequencies  of  CD8+CD103+ 
cells.  CD8+CD103+  cells  were  antigen-experienced  (CD45RA~CD45RO+CD62LloCCR7~)  and  of  an 
intermediate  (EM2)  effector  memory  phenotype  (CD27+CD28“).  TCR  repertoire  analysis  indicated 
significant  skewing  between  CD8+CD103~  and  CD8+CD103 1  T  cell  subsets,  suggesting  the  two  populations 
contain  distinct  antigenic  specificities.  Lastly,  HLA  pentamer  analysis  revealed  that  one  patient  in  the  cohort 
harbored  a  high  frequency  of  CD8+  T  cells  in  ascites  that  were  specific  for  the  tumor  antigen  NY-ESO-1,  and 
that  -75%  of  these  NY-ESO-1  specific  CD8+  T  cells  were  CD103+. 

Conclusions.  CD103+  may  be  a  marker  of  activated  and  tumor-reactive  CD8+  T  cells  in  high-grade  serous 
ovarian  cancer. 

©  2010  Elsevier  Inc.  All  rights  reserved. 


Introduction 

The  integrin  CD103  (aE/p7)  is  expressed  on  fewer  than  2%  of 
circulating  peripheral  blood  cells,  but  is  widely  expressed  on 
intraepithelial  CD8+  T  cells  (1EL)  present  in  the  gut  and  lung  mucosa 
[1-3]  and  on  tissue-infiltrating  CD8+  T  cells  during  allograft  rejection 
[4j.  The  only  known  ligand  for  CD103  is  the  epithelial  cell  surface 
molecule,  E-cadherin,  and  CD103/E-cadherin  interactions  are  thought 


*  This  work  was  supported  by  the  British  Columbia  Cancer  Foundation  and  by  a  grant 
to  J.R.W.  and  B.H.N.  by  the  US  Department  of  Defense.  J.S.N  is  supported  by  a  fellowship 
from  the  Canadian  Institutes  of  Health  Research. 

*  Corresponding  author.  Deeley  Research  Centre,  British  Columbia  Cancer  Agency, 
Victoria,  BC,  Canada  V8R  6V5.  Fax:  +250  519  2040. 

E-mail  address:  jwebb@bccancer.bc.ca  (J.R.  Webb). 

0090-8258/$  -  see  front  matter  ©  2010  Elsevier  Inc.  All  rights  reserved. 
doi:10.1016/j.ygyno.2010.05.016 


to  play  an  important  role  in  the  homing  and  retention  properties  of 
intraepithelial  lymphocytes  [5j.  Recently,  CD8+CD103+  cells  have 
also  been  reported  to  exert  regulatory  functions  via  secretion  of  IL-10 
as  well  as  cell  contact-mediated  suppressive  activity  [6j.  Despite  the 
apparent  diversity  of  activities  attributed  to  CD8+CD103+  cells,  all 
appear  to  be  intimately  associated  with  the  availability  of  TGF-[3, 
which  is  an  integral  factor  in  the  regulation  of  CD103  surface 
expression  [7], 

Increasing  evidence  suggests  that  interactions  between  CD103  and 
E-cadherin  also  play  an  important  role  in  specific  immunity  against 
cancers  of  epithelial  origin.  Specifically,  CD103/E-cadherin  interac¬ 
tions  have  been  shown  to  be  critically  important  for  recognition  and 
killing  of  tumor  cells  by  human  colon  carcinoma-specific  CTL  [8],  lung 
cancer-specific  CTL  [9]  and  pancreatic  cancer-specific  CTL  [10].  In 
addition,  elevated  frequencies  of  CD8+CD103+  tumor-infiltrating 
cells  have  been  reported  in  subsets  of  colorectal  cancer  [11]  and 
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bladder  cancer  [12].  Perhaps  the  most  intriguing  aspect  of  the 
association  between  CD103  and  epithelial  tumors  is  the  finding  that 
CD103  expression  can  be  induced  on  otherwise  CD1 03-negative  CD8+ 
T  cells  by  inclusion  of  TGF-|3  in  culture  media  during  in  vitro  priming. 
CD8+  T  cells  activated  in  the  presence  of  TGF-|3  are  then  primed  for 
rapid  CD103  re-expression  upon  subsequent  antigen  exposure,  even 
in  the  presence  of  very  low  levels  of  TGF-[3  [8].  In  addition,  CD103 
expression  by  tumor-infiltrating  lymphocytes  has  been  shown  to  play 
a  critical  role  in  retaining  tumor-specific  lymphocytes  within  the 
tumor  microenvironment  [13].  Thus,  CD103  expression  by  tumor- 
specific  T  cells  appears  to  facilitate  appropriate  homing  to  relevant 
tumor  sites,  which  may  be  important  for  immunotherapy  of  epithelial 
cancers. 

Tumor-infiltrating  CD8+  T  cells  are  strongly  associated  with 
increased  progression-free  and  overall  survival  in  high-grade  serous 
epithelial  ovarian  cancer  (EOC)  [14].  Given  the  above  evidence 
implicating  CD103  in  the  localization  and  activation  of  tumor-specific 
T  cells  in  epithelial  cancers,  we  reasoned  CD103  might  also  be 
expressed  by  tumor-associated  T  cells  in  ovarian  cancer.  Herein,  we 
report  that  a  subset  of  EOC  patients  have  malignant  ascites  that 
contains  profoundly  elevated  numbers  of  CD8+CD103+  T  cells,  which 
are  positively  correlated  with  levels  of  TGF-[3  in  ascites  fluid. 
Phenotypic  analysis  of  these  CD8+CD103+  cells  suggests  that  they 
are  antigen-experienced,  effector  memory  cells  with  a  skewed  TCR  V|3 
repertoire  compared  to  CD8+CD103“  cells.  Furthermore,  our  data 
suggests  that  CD  103  may  be  a  marker  of  tumor  antigen-specific  CD8+ 
T  cells  in  EOC,  a  finding  which  has  implications  for  the  development  of 
effective  immune-based  treatments  for  this  challenging  disease. 

Materials  and  methods 

Patients  and  tissues 

Primary  tumor  tissue  and  malignant  ascites  were  obtained  from 
patients  with  high-grade  serous  EOC  undergoing  initial  de-bulking 
surgery.  Ascites  was  also  obtained  from  two  patients  with  recurrent 
disease.  All  specimens  and  clinical  data  were  obtained  with  written 
informed  consent  under  protocols  approved  by  the  Research  Ethics 
Board  of  the  British  Columbia  Cancer  Agency  and  the  University  of 
British  Columbia.  Cells  in  ascites  samples  were  pelleted  by  centrifu¬ 
gation  and  analyzed  immediately  by  FACS  or  were  cryopreserved  in 
liquid  nitrogen  for  future  analysis.  In  instances  where  significant 
numbers  of  red  blood  cells  (RBC)  were  present,  samples  were  treated 
with  ACK  lysis  buffer  (Biowhittaker)  prior  to  centrifugation.  After 
centrifugation,  ascites  fluid  was  retained  for  TGF-|3  ELISA  analysis. 
Bulk  primary  tumor  tissue  was  digested  overnight  at  4  °C  with  a 
combination  of  collagenase,  DNase  I  and  hyaluronidase  (all  from 
Sigma),  then  passed  through  a  100-qM  filter,  centrifuged  and 
cryopreserved  as  above. 

Antibodies  and  FACS  analysis 

Single  cell  suspensions  from  malignant  ascites  or  primary  tumor 
tissue  were  surface  stained  with  antibodies  specific  for  CD4,  CD8, 
CD103,  CD25,  CD27,  CD28,  HLA-DR,  CCR7  and  CD137  as  indicated  (all 
from  BD  Biosciences)  and  were  analyzed  on  a  BD  FACSCalibur  using  a 
live  lymphocyte  gate.  TCR  spectratyping  was  performed  by  staining 
with  CD8  and  CD103  plus  a  combination  of  antibodies  specific  for  24 
different  TCR  V[3  chains  (IOTEST  BetaMark,  Beckman  Coulter).  Where 
indicated,  cells  were  also  stained  with  an  HLA-A2  pentamer  reagent 
loaded  with  the  HLA-A2  restricted  epitope  of  NY-ESO-1  (NY-ESO- 
1 157-165 )  (Proimmune)  following  manufacturer's  instructions.  For 
intracellular  cytokine  analysis,  cells  were  incubated  in  cRPMI  media 
(RPMI  1640,  10%  FBS,  2  mM  L-glutamine,  50  pM  2-mercaptoethanol, 
10  mM  HEPES,  and  10  mM  sodium  pyruvate)  for  6  h  in  the  absence  of 
any  additional  stimulus,  or  in  the  presence  of  PMA  (50  ng/ml)  plus 


ionomycin  (250  ng/ ml).  Egress  of  cytokine  from  the  cell  was  inhibited 
by  adding  2  pM  monensin  (GolgiStop,  BD  Biosciences)  for  the 
duration  of  the  incubation.  After  6  h  incubation,  cells  were  recovered 
by  centrifugation,  surface  stained  with  the  indicated  antibodies,  and 
fixed  and  permeabilized  using  Cytofix/Cytoperm  (BD  Biosciences) 
according  to  manufacturer's  instructions.  Intracellular  cytokines  were 
then  detected  using  anti-IFN-7  or  biotinylated  anti-IL-10  antibodies 
plus  streptavidin-PE  (all  from  BD  Biosciences).  TGF-p  levels  in  ascites 
fluid  were  quantified  by  ELISA  (eBioscience)  according  to  manufac¬ 
turer's  instructions,  and  results  are  reported  as  a  combination  of  active 
and  latent  TGF-pk 

IFN-y  ELISPOT 

ELISPOT  plates  (MSIP,  Millipore)  were  pre-coated  overnight  with 
lOpg/ml  anti-IFN-7  capture  antibody  (mAb  1-D1K,  Mabtech)  and 
then  blocked  for  2  h  at  37  °C  with  cRPMI.  Ascites  cells  (3  x  1 05  cells  per 
well)  were  plated  in  triplicate  in  the  absence  of  any  stimulus  (media 
only),  or  in  the  presence  of  10  pg/ml  melanA26-35  peptide  (irrelevant 
HLA-A2  binding  control  peptide)  or  10pg/ml  NY-ES0i57_i65  peptide 
(both  peptides  from  Anaspec).  After  overnight  incubation  at  37  °C, 
ELISPOT  plates  were  washed  and  incubated  for  2  h  at  37  °C  with  1  pg / 
ml  biotinylated  anti-IFN-7  antibody  (mAb  7-B6-1,  Mabtech)  followed 
by  development  with  Vectastain  ABC  Elite  kit  and  Vectastain  AEC 
substrate  reagent  according  to  manufacturer's  instructions  (Vector 
Labs). 

In  vitro  expansion  of  ascites  T  cells 

Single  cell  suspensions  from  malignant  ascites  were  incubated  for 
7  days  in  cRPMI  (1  xlO6  cells/ml)  in  the  presence  of  6000  U/ml  IL-2 
(Tecin,  Roche)  with  or  without  2  ng/ml  TGF-[3  (Peprotech)  as 
indicated.  After  7  days,  a  portion  of  the  cell  culture  was  used  for 
FACS  analysis  and  the  remainder  was  subjected  to  rapid  expansion 
protocol  (REP).  Briefly,  REP  cultures  are  comprised  of  IL-2-expanded 
cells  mixed  at  a  1:10  ratio  with  autologous  irradiated  (3500  rad) 
PBMC  plus  anti-CD3  (30  ng/ml,  OKT3,  eBioscience)  and  IL-2  (50  U/ 
ml,  Tecin,  Roche).  The  REP  was  performed  in  the  presence  or  absence 
of  2  ng/ml  TGF-(3  (Peprotech),  as  indicated,  for  14  days.  Cells  were 
then  harvested  and  re-stimulated  (4xl05  cells  per  ml)  with  media 
only,  immobilized  anti-CD3  (5pg/ml)  or  2xl04  autologous  tumor 
cells  for  6  h  in  the  presence  of  GolgiStop  (BD  Biosciences)  and  were 
analyzed  by  FACS  to  detect  CD8  and  CD103  surface  expression  and 
intracellular  IFN-7. 

Results 

CD103  surface  expression  on  CD8+  T  cells  in  malignant  ascites  of 
ovarian  cancer  patients 

Approximately  30%  of  patients  with  high-grade  serous  EOC  present 
with  ascites  fluid  ( malignant  ascites ),  which  typically  contains  a  variable 
mixture  of  tumor  cells  and  inflammatory  cells.  To  better  define  the 
nature  of  the  host  immune  response  to  EOC,  we  collected  ascites  from  1 3 
previously  untreated  EOC  patients  at  the  time  of  their  primary  de- 
bulking  surgery  and  analyzed  resident  lymphocyte  populations  by  flow 
cytometry  using  a  panel  of  antibodies  against  lymphocyte  surface 
markers.  In  some  ascites  specimens,  there  was  a  dramatic  accumulation 
of  CD8+  T  cells  that  expressed  CD103  (aE/[i7  integrin)  on  the  cell 
surface  (Fig.  1  A).  Similarly,  we  observed  that  a  high  proportion  of  CD8+ 
lymphocytes  in  solid  tumor  tissue  also  expressed  CD103  on  the  cell 
surface  (Fig.  IB).  The  proportion  of  CD8+CD103+  cells  in  ascites  varied 
widely,  ranging  from  approximately  3%  (1ROC008)  to  greater  than  70% 
(IROC033)  of  all  CD8+  T  cells  (Fig.  1C).  For  three  of  the  patients  in  the 
cohort  with  a  high  frequency  of  CD1 03-expressing  CD8+  T  cells 
(IROC024, 1ROC0033  and  1ROC065)  this  corresponded  to  approximately 
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Fig.  1.  CD8+CD103+  T  cells  in  malignant  ascites  and  solid  tumors  of  serous  EOC  patients.  (A  and  B)  Flow  cytometric  analysis  showing  CD103  expression  on  CD4+  and  CD8+  T  cells  in 
(A)  ascites  or  (B)  enzymatically  dissociated  primary  tumor  from  a  single  patient  at  the  time  of  initial  de-bulking  surgery.  (C)  CD103  expression  on  CD4+  and  CD8+  T  cells  in  ascites 
samples  from  14  serous  ovarian  cancer  patients,  including  one  patient  (1ROC024)  for  which  primary  and  recurrent  samples  (8  months  after  primary  de-bulking)  were  available.  Data 
in  panels  A  and  B  are  plotted  as  the  percentage  of  CD1 03-expressing  cells  within  the  total  lymphocyte  gate  whereas  data  in  panel  C  are  plotted  as  the  number  of  CD1 03-expressing 
cells  within  the  CD4  or  CD8  lymphocyte  population. 


1.6x10s,  3.3x10s  and  1.1x10s  total  CD8+CD103+  cells,  respectively, 
in  the  volume  of  ascites  fluid  recovered  during  surgery  (IROC024- 
1800  ml,  IROC033-900  ml  and  1ROC065-475  ml). 

Simultaneous  analysis  of  the  CD4  compartment  revealed  that, 
unlike  CD8+  T  cells,  only  a  small  minority  of  CD4+  T  cells  in  malignant 
ascites,  or  solid  tumor  tissue,  expressed  CD103,  even  in  those  patients 
with  very  high  frequencies  of  CD8+CD103+  cells.  Interestingly,  during 
the  course  of  the  study,  one  of  the  patients  (1ROC024)  required  a 
second,  palliative  paracentesis  procedure  to  remove  a  subsequent 
ascites  buildup  (8  months  after  primary  de-bulking  surgery).  A  large 
population  of  CD8+CD103+  cells  was  still  present  in  this  recurrent 
ascites  sample,  suggesting  that  this  lymphocyte  subpopulation  is 


stable  over  time.  CD8+CD103+  cells  were  also  detected  in  the  ascites 
of  a  second  patient  with  recurrent  disease  (1ROC015).  However  this 
patient  did  not  present  with  ascites  at  primaiy  surgery  thus  no 
comparison  between  primary  and  recurrent  disease  was  possible. 

The  only  known  ligand  for  CD103  is  the  epithelial  cell  surface 
adhesion  molecule  E-cadherin;  therefore,  we  assessed  whether  the 
presence  of  CD8+CD103+  T  cells  correlated  with  the  expression  of  E- 
cadherin  on  tumor  cells.  Immunohistochemical  analysis  of  paraffin- 
embedded  tumor  tissue  revealed  that  E-cadherin  was  expressed  on 
all  tumor  specimens  (data  not  shown),  despite  the  varying  levels  of 
CD8+CD103+  T  cells.  Therefore,  the  frequency  of  CD8+CD103+  T  cells 
in  ascites  is  not  correlated  to  E-cadherin  expression  by  tumors. 
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Previous  studies  have  indicated  that  CD103  expression  on  T  cells  is 
upregulated  in  response  to  TGF-|3  but  only  when  TGF-p  is  delivered  in 
conjunction  with  a  TCR-mediated  signal  [8,1 5,1 6].  We  therefore  assessed 
the  levels  of  TGF-fi  in  ascites  fluid  (Fig.  2),  Although  TGF-f3  was  not 
detected  in  every  sample  with  elevated  frequencies  of  CD8+CD103+  cells 
(IROC030  for  example),  there  was  a  statistically  significant  correlation 
between  the  frequency  CD8+CD1 03  +  T  cells  and  levels  of  TGF-|3  in  ascitic 
fluid  (p  =  0.005,  Pearson  correlation,  two-tailed). 

Phenotype  of  CD8+CD103+  T  cells  in  malignant  ascites  of  ovarian  cancer 
patients 

It  has  been  previously  demonstrated  in  other  epithelial  tumor 
settings  that  CD103  plays  an  important  role  in  recognition  and  killing  of 
tumor  targets  [8,9],  Therefore,  we  assessed  whether  CD103  expression 
by  malignant  ascites  CD8+  T  cells  was  associated  with  an  activated 
( effector)  T  cell  phenotype  ( Fig.  3 ) .  We  focused  upon  three  patients  with 
the  highest  percentage  of  CD8+CD103+  cells  (IROC024,  IROC033  and 
1ROC065).  The  vast  majority  of  CD8+  T  cells  were  negative  for 
expression  of  the  activation  markers  CD25  and  CD137  and  the 
chemoldne  receptor  CCR7,  regardless  of  whether  they  belonged  to  the 
CD8+CD103'  or  CDS' CD  103+  subset.  However,  CDS  'CD  103+  T  cells 
in  all  three  patients  expressed  higher  levels  of  HLA-DRon  the  cell  surface 
than  did  CD8+CD103-  cells,  consistent  with  CD8+CD103+  cells  being  in 
an  activated  state.  In  addition,  CD8+CD103+  and  CD8+CD103“  T  cells 
could  be  readily  discriminated  based  upon  their  pattern  of  CD27  and 
CD28  expression.  Although  all  ascites  CD8+  T  cells  were  generally 
positive  for  CD27  expression,  CD8+CD103“  T  cells  tended  to  be  CD28+ 
whereas  CD8+CD1 03+  T  cells  were  either  CD28“  or  a  mixture  of  CD28+ 
and  CD28-  cells.  This  pattern  of  CD27  and  CD28  expression,  in 
combination  with  other  markers  of  differentiation  state  (CD45RO+, 
CD45RA',CD62L“,  data  not  shown),  is  consistent  with  CD8+CD103+ 
cells  being  antigen-experienced  and  belonging  to  an  intermediate 
effector  memory  population  previously  designated  ‘EM2'  [17], 

Prior  studies  have  indicated  that  CD103  expression  can  also  be  a 
characteristic  of  CD8+  T  cells  with  regulatory  functions  [6,18,19], 
including  production  of  IL-10.  Thus,  we  assessed  IL-10  and  IFN-7 
production  by  CD8+CD103_  versus  CD8+CD103+  T  cells  in  a  bulk 
ascites  cell  preparation  containing  a  mixture  of  CD8+CD103“  cells  and 
CD8+CD103+  T  cells  (Fig.  4A).  In  the  absence  of  specific  stimulation, 
CD8+CD103+  T  cells  appeared  to  constitutively  express  a  low  amount  of 
IL-10  whereas  CD8+CD103~  T  cells  produced  neither  IL-10  nor  IFN-7 
(Fig.  4B).  When  activated  with  a  polyclonal  stimulus  (PMA  +  ionomy- 


Fig.  2.  The  percentage  of  CD8+CD103+  T  cells  is  positively  correlated  with  TGF-p  levels 
in  ascites  fluid.  Ascites  collected  from  patients  described  in  Fig.  1  was  centrifuged  to 
remove  cellular  components  and  the  level  of  TGF-p  in  ascites  fluid  was  measured  by 
cytokine  capture  ELISA.  Data  are  reported  as  the  combination  of  latent  plus  active  TGF-(3 
(mean  of  triplicate  wells  plus  SEM)  (Y-axis)  and  the  percentage  of  CD1 03+  CD8+  T  cells 
(X-axis)  for  individual  patients  from  Fig.  1. 


cin),  both  the  CD8+CD103“  and  CD8+CD103+  T  cells  produced  IFN-7. 
In  addition,  CD8+CD103+  T  cells  continued  to  produce  IL-10  in  response 
to  PMA  +  ionomycin,  however  the  level  of  IL-1 0  did  not  increase  beyond 
constitutive  levels.  It  is  interesting  to  note  that  although  this  experiment 
was  performed  with  a  bulk  ascites  preparation  containing  a  mixture  of 
CD8+CD103”  and  IL-10-producing  CD8+CD103+  T  cells,  CD8+CD103' 
cells  were  still  capable  of  producing  IFN-7  when  stimulated,  suggesting 
that  CD8+CD103+  T  cells  did  not  fully  abrogate  the  effector  function  of 
neighboring  CD8+CD1 03'  cells.  However,  PMA  +  ionomycin  is  a  potent 
stimulus  that  may  override  more  subtle  inhibitory  effects. 

Comparison  ofTCR  V/3  usage  by  CD8+CD103+  and  CD8+CD103~  T  cell 
subsets  in  malignant  ascites 

To  further  assess  whether  CD8+CD103+  cells  might  constitute 
tumor-reactive  cells,  we  looked  for  evidence  of  skewing  of  the  T  cell 
receptor  (TCR)  repertoire  in  this  T  cell  subpopulation.  Ascites  samples 
from  patients  IROC024,  IROC033  and  IROC065  were  stained  with  CD8- 
and  CD1 03-specific  antibodies  in  combination  with  a  panel  of  multi¬ 
plexed  antibodies  specific  for  24  different  human  TCR  V[3  chains,  which 
together  account  for  approximately  70%  of  the  human  TCR  V[3  gene 
repertoire.  As  shown  in  Fig.  5,  the  CD8+CD103~  and  CD8+CD103+  T  cell 
subpopulations  showed  distinct  patterns  ofTCR  V|3  usage.  For  example, 
in  patient  IROC024,  two  predominant  V|3  subsets  (1/(34  and  V(37.2) 
comprised  approximately  36%  of  the  CD8+CD1 03  +  subset,  but  only  6%  of 
the  CD8+CD103-  subpopulation.  Likewise,  in  patient  1ROC033,  cells 
expressing  V[31  and  V[320  together  comprised  approximately  26%  of  the 
CD8+CD103+  subset,  but  less  than  6%  of  the  CD8+CD103~  subpopula¬ 
tion.  Even  more  strikingly,  in  patient  IROC065  the  CD8+CD103+  subset 
was  comprised  almost  exclusively  (>85%)  of  cells  expressing  V[317. 
Together  these  results  imply  that,  based  upon  their  distinct  TCR 
repertoires,  the  CD8+CD103+  and  CD8+CD103“  T  cell  subsets  have 
distinct  antigen  specificities. 

CD8+  T  cells  specific  for  the  tumor  antigen  NY-ESO-1  fall  predominantly 
in  the  CD8+CD103+  subset 

Unlike  other  tumor  sites  such  as  melanoma,  there  are  relatively 
few  defined  T  cell  antigens  in  EOC,  which  limits  the  ability  to 
characterize  tumor  antigen-specific  T  cells.  However,  we  and  others 
have  reported  that  a  subset  of  EOC  patients  has  robust  CD8+  T  cell 
responses  to  the  cancer-testis  antigen  NY-ESO-1  [20].  In  the  present 
cohort,  one  HLA-A2+  patient  (1ROC013)  demonstrated  a  strong  CD8+ 
T  cell  response  to  an  HLA-A2-restricted  epitope  of  NY-ESO-1  (NY- 
ESO-li57_i65).  This  response  was  readily  detected  by  ELISPOT  analysis 
of  ascites-derived  T  cells  without  the  need  for  ex  vivo  expansion 
(Fig.  6A).  This  same  patient  also  had  a  high  frequency  of  CD8+CD103+ 
T  cells  in  malignant  ascites  (Fig.  6B).  We  hypothesized  that  NY-ESO-l- 
reactive  CD8+  T  cells  might  be  restricted  to  the  CD103+  subset. 
Indeed,  by  labeling  T  cells  with  HLA-A2/NY-ESO-1157_i65  pentamers, 
we  found  that  ~75%  of  NY-ESO-1 157_165  reactive  CD8+  T  cells 
expressed  CD103  (Fig.  6C). 

Co-engagement  of  TGF-fi  receptor  and  T  cell  receptor  is  required  for 
induction  and  maintenance  of  CD103  expression  by  ascites-derived  CD8+ 
T  cells 

To  directly  test  whether  TGF-[3  could  induce  CD103  expression  on 
CD8+  T  cells  and  to  assess  the  requirement  for  TCR  co-engagement,  T 
cells  from  malignant  ascites  of  patients  with  either  a  low  (IROC008)  or 
high  (IROC033)  initial  frequency  of  CD8+CD103+  T  cells  (Figs.  7Aand  B, 
respectively)  were  expanded  in  vitro  in  the  presence  or  absence  ofTGF- 
(3>.  Bulk  ascites  cells  were  subjected  to  a  conventional  ex  vivo  TIL 
expansion  protocol  comprised  of  an  initial  round  ofT  cell  expansion  in 
high  dose  1L-2  (6000  U/ml)  followed  by  a  subsequent  round  of  rapid 
expansion  (REP)  using  anti-CD3  antibody  (OKT-3)  [21],  Both  phases  of 
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Fig.  3.  Surface  phenotype  of  CD8+CD103+  T  cells.  Malignant  ascites  samples  from  three  patients  with  elevated  frequencies  of  CD8+CD103+  T  cells  (IROC024,  IROC033  and  IROC065) 
were  surface  labeled  with  antibodies  to  CD8  (gated,  not  shown),  CD103  (X-axis),  and  the  indicated  T  cell  activation  markers  (V-axis)  and  analyzed  by  flow  cytometry.  Note  that 
CD8+CD103-  and  CD8+CD103+  T  cells  showed  distinct  expression  patterns  for  CD28  and  HLA-DR  (CD8+CD103-  cells  were  CD28+  and  HLA-DRl°,  whereas  CD8+CD103+  cells 
were  CD28+/_  and  HLA-DRhi. 
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Fig.  4.  Cytokine  production  by  CD8+CD103+  T  cells.  Malignant  ascites  cells  from  a  representative  EOC  patient  with  elevated  frequencies  of  CD8+CD103+  T  cells  (1ROC033)  were 
analyzed  by  flow  cytometry  using  antibodies  specific  for  CD8  and  CD103  (A),  and  by  intracellular  cytokine  staining  using  antibodies  specific  for  IFN-y  and  IL-10  (B).  In  panel  B,  cells 
were  analyzed  after  incubation  in  media  only  (unstimulated)  or  after  6h  of  stimulation  with  polyclonal  stimulus  (PMA  +  ionomycin).  Samples  were  gated  on  lymphocyte 
populations  by  forward  and  side  scatter.  In  panel  B  samples  were  further  gated  into  CDS  1  CD  103  and  CD8+CD103+  subpopulations. 


expansion  were  performed  in  the  presence  or  absence  of  exogenous 
TGF-p.  CD8+  T  cells  retained  their  initial  CD103  phenotype  during  the 
IL-2 -mediated  phase  of  expansion,  regardless  of  whether  or  not  TGF-p 
was  present  in  the  medium  (data  not  shown).  However,  CD8+  T  cells 
derived  from  the  ascites  sample  with  a  high  initial  CD8+CD103+ 
frequency  (IROC033)  retained  CD103+  expression  when  expanded 
with  anti-CD3  in  the  presence  of  TGF-f3  (Fig.  7D,  right  column),  but 
rapidly  lost  CD1 03  when  expanded  with  anti-CD3  in  the  absence  of  TGF- 
(3  (Fig.  7D,  left  column).  In  the  ascites  sample  that  initially  had  low 
proportions  of  CD8+CD103+  T  cells  (IROC008),  the  addition  of 
exogenous  TGF-[3  during  anti-CD3  expansion  resulted  in  the  emergence 
of  a  population  of  CD8+CD103+  T  cells  (Fig.  7C,  right  column)  whereas 
cells  remained  CD103-  when  stimulated  in  the  absence  of  TGF-p 
(Fig.  7C,  left  column).  Together,  these  data  demonstrate  that  TGF-fl  is  a 
key  regulator  of  CD1 03  expression  on  ascites  CD8+  T  cells,  but  that  TGF- 
(3-dependent  up-regulation  of  CD103  surface  expression  is  highly 
dependent  upon  concurrent  co-stimulation  through  the  TCR. 

Lastly,  to  directly  assess  whether  CD103+  expression  may  be  a 
potential  marker  of  tumor  reactive  CD8+  T  cells  in  ascites,  we  then 
compared  the  in  vitro  expanded  T  cells  from  IROC008  versus  IROC033 
for  their  ability  to  recognize  autologous  tumor  cells.  After  a  14-day 
period  of  in  vitro  expansion,  CD8+  T  cells  from  both  1ROC008  and 
IROC033  became  quiescent  and  were  negative  for  intracellular  IFN-y 
staining  in  the  absence  of  stimulation  (Figs.  7C  and  D,  upper  panels) 


and  were  positive  for  intracellular  IFN-y  staining  when  re-stimulated 
with  anti-CD3,  regardless  of  whether  they  were  expanded  in  the 
presence  or  absence  TGF-f3,  or  their  CD103  status  (Figs.  7C  and  D, 
middle  panels).  However,  when  stimulated  with  autologous  tumor 
cells,  only  those  T  cells  derived  from  the  ascites  sample  that  initially 
had  high  CD8+CD103+  T  cells  in  vivo  (IROC033)  produced  IFN-y 
(Figs.  7C  and  D,  lower  panels).  Thus,  even  though  CD103  expression 
could  be  induced  on  CD8+  T  cells  from  both  patients  by  ex  vivo 
expansion  with  anti-CD3  plus  TGF-(3,  only  the  ascites  sample  that 
originally  contained  CD8+CD103+  T  cells  was  tumor-reactive. 

Discussion 

We  report  that  a  subset  of  patients  with  high-grade  serous  EOC  has 
profoundly  elevated  numbers  of  CD8+CD103+  T  cells  in  malignant 
ascites.  CD8+CD103+  T  cells  exhibit  an  antigen-experienced,  effector 
memory  phenotype  and  utilize  a  distinct  TCR  V[3  repertoire  compared 
to  CD8+CD103_  T  cells.  In  addition,  T  cells  from  a  malignant  ascites 
sample  with  known  reactivity  to  the  defined  tumor  antigen  NY-ESO-1 
were  predominantly  of  the  CD8+CD103+  subset.  To  our  knowledge, 
this  is  the  first  report  that  CD8+CD103+  T  cells  are  present  in  ovarian 
cancer  and,  indeed,  in  the  ascites  compartment  under  any  pathological 
condition.  Furthermore,  our  results  suggest  CD103  may  serve  as  a 
marker  for  isolating  tumor-reactive  T  cells  for  immunotherapy. 
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Fig.  5.  TCR  V|’>  repertoire  of  CD8+CD103  and  CD8+CD103+  T  cell  populations. 
Malignant  ascites  samples  from  three  EOC  patients  with  elevated  frequencies  of  CDS  1 
CD103+  T  cells  (IROC024,  1ROC033  and  1ROC065)  were  surface  labeled  with  anti¬ 
bodies  to  CD8  and  CD103  followed  by  a  panel  of  multiplexed  antibodies  specific  for 
24  different  TCR  V(’>  family  members.  Data  are  shown  as  the  percentage  of  cells 
expressing  a  specific  TCR  V[1  family  member  after  gating  into  CDS  1  CD  103  and  CD8+ 
CD103+  subpopulations. 


Expression  of  CD103  is  normally  restricted  to  CD8+  intraepithelial 
T  cells  (IEL)  present  in  mucosal  surfaces,  where  it  mediates  interaction 
with  epithelial  cells  via  binding  to  its  only  known  ligand,  E-cadherin 
[1,2].  CD103  mediates  interactions  between  T  lymphocytes  and 
epithelial  cells  via  several  mechanisms:  1)  by  facilitating  cell  to  cell 
adhesion  [22,23],  2)  through  CD103-mediated  signaling  to  the  T  cell  to 
promote  T  cell  proliferation  and  effector  function  [9,24],  and  3)  by  E- 
cadherin-mediated  signaling  to  the  epithelial  cell  [25].  CD103  is  also 
expressed  on  mucosal  dendritic  cells  [26,27]  and  on  some  CD4+CD25+ 
T  regulatory  cells  [28,29]. 

The  prognostic  significance  of  the  CD8+CD103+  T  cell  population 
in  EOC  ascites  is  currently  unknown  and  will  require  analysis  of  a 


Fig.  6.  CD8+  T  cells  specific  for  the  tumor  antigen  NY-ESO-1  are  predominantly  CD103+. 
Panels  A-C  show  results  for  T  cells  derived  from  the  malignant  ascites  of  an  HLA-A2+ 
EOC  patient  (IROC013)  who  demonstrated  reactivity  to  an  HLA-A2-restricted  epitope 
of  NY-ESO-1  (NY-ESO-1  i57_i65).  (A)  Representative  single  wells  (of  triplicates)  from  an 
lFN-y  EL1SPOT  assay  in  which  bulk  malignant  ascites  cells  were  incubated  overnight 
with  NY-ESO-1 157_165  peptide.  Negative  controls  include  media  alone,  or  an  unrelated 
HLA-A2-binding  peptide  (Melan  A26-35).  (B)  Flow  cytometric  analysis  showing  the  fre¬ 
quency  of  CD8+CD103+  T  cells.  (C)  Flow  cytometric  analysis  showing  the  frequency 
of  CD8+  lymphocytes  staining  positive  with  an  HLA-A2/NY-ES0-1157_165  pentamer 
(Proimmune).  Pentamer-positive  and  -negative  CD8+  T  cells  were  further  characterized 
for  CD103  expression  (right  panels).  All  events  in  panels  B  and  C  were  first  gated  on  total 
lymphocyte  populations  by  forward  and  side  scatter. 


larger  patient  cohort  with  longer  follow-up.  Nonetheless,  there  is 
abundant  evidence  that  the  presence  of  intra-tumoral  CD8+  T  cells  in 
general,  and  intra-epithelial  CD8+  T  cells  in  particular,  are  associated 
with  long  term  survival  of  serous  EOC  patients  [30-32].  Notably,  in 
these  studies  intraepithelial  T  cells  have  been  characterized  based  on 
their  histological  location  in  tumor  tissue,  rather  than  on  the  basis  of 
specific  surface  markers.  Thus,  little  is  known  regarding  their  antigen 
specificity  or  tumor  homing  properties.  Unfortunately,  there  is 
currently  no  anti-CD  103  antibody  available  that  can  be  used  for 
paraffin-embedded,  formalin-fixed  tissue,  making  it  difficult  to 
compare  the  CD8+CD103+  T  cells  identified  herein  by  flow  cytometry 
with  intraepithelial  CD8+  T  cells  identified  in  large  retrospective 
immunohistochemical  studies.  However,  as  shown  in  Fig  1,  the 
relative  frequency  of  CD103-expressing  CD8  T  cells  in  bulk  ascites  and 
in  primary  tumor  tissue  appears  to  be  similar.  Furthermore,  the 
results  reported  here  concerning  the  activation  phenotype,  TCR  V[3 
repertoire,  and  tumor  antigen  specificity  of  CD8+CD103+  T  cells  in 
ascites  strongly  support  the  hypothesis  that  CD  103  may  demarcate 
the  tumor-reactive  intraepithelial  CD8+  T  cell  subset  associated  with 
long-term  survival.  If  this  hypothesis  is  correct,  then  these  cells 
warrant  further  study  for  antigen  discovery  and  immunotherapy. 
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Fig.  7.  CD1 03  surface  expression  on  CD8+  T  cells  can  be  regulated  by  inclusion  ofTGF-f.  in  culture  media  during  the  TCR-dependent  phase  of  ex  vivo  TIL  expansion.  (A)  Bulk  ascites 
cells  from  two  patients  with  either  low  (IROC008)  or  high  (IROC033)  frequencies  of  CD8+CD103+  T  cell  populations  were  expanded  using  a  conventional  ex  vivo  TIL  expansion 
protocol  comprised  of  an  initial  round  of  expansion  in  high  dose  IL-2  followed  by  a  subsequent  rapid  expansion  using  anti-CD3  antibody.  Both  the  IL-2  and  REP  expansions  were 
performed  in  the  presence  or  absence  of  exogenous  TGF-[3  (2  ng/ml).  At  the  end  of  the  rapid  expansion  period,  cells  were  analyzed  by  intracellular  cytokine  staining  to  assess  the 
ability  of  CDS  CD  103  and  CD8+CD103+  T  cells  to  produce  IFN-7  after  re-stimulation  with  media  only,  polyclonal  stimulation  (immobilized  anti-CD3  antibody)  or  autologous 
tumor  cell  lines.  All  events  shown  in  C  and  D  are  first  gated  on  CD8+  lymphocyte  populations. 


Prior  studies  have  shown  that  TGF-p  can  upregulate  CD103 
expression  on  T  cells  only  if  delivered  in  conjunction  with  a  TCR- 
dependent  signal  [8,16]  and  that  such  dual  signals  normally  promote 
IEL  retention  in  mucosal  environments  [22,33],  These  data  provide 
further  evidence  supporting  our  hypothesis  that  CD8+CD103+  T  cells 
represent  tumor-specific  lymphocytes,  as  CD103  expression  would 
only  be  expected  to  be  elicited  by  concurrent  stimulation  with  TCR 
and  TGF-p.  Indeed  we  demonstrate  herein  that  by  comparing  TIL 
cultures  derived  from  2  EOC  ascites  with  high  and  low  frequencies  of 
CD8+CD103+  T  cells,  only  the  TIL  culture  derived  from  the  CD103 
high  ascites  sample  was  capable  of  recognizing  autologous  tumor.  A 
recent  study  by  Ling  et  al.  [8]  shows  that  the  presence  ofTGF-p  may  be 
particularly  important  during  the  initial  priming  of  naive  T  cells,  and 
thatT  cells  primed  in  the  presence  ofTGF-p  require  much  lower  doses 


ofTGF-p  to  upregulate  CD  103  during  subsequent  antigen  exposures. 
These  data  suggest  that  TGF-p  could  potentially  be  used  in  TIL  cultures 
to  upregulate  CD103  expression  by  CD8+  T  cells,  thereby  enhancing 
their  homing  to  epithelial  tumors  after  adoptive  transfer.  Interesting¬ 
ly,  despite  the  fact  that  TGF-p  is  widely  regarded  as  an  immunosup¬ 
pressive  cytokine,  a  recent  study  demonstrated  that  TGF-p  can 
actually  augment  the  preferential  outgrowth  of  melanoma-reactive 
cells  during  the  in  vitro  expansion  of  TIL  for  adoptive  immunotherapy 
[34].  Although  CD103  was  not  measured  in  that  study,  based  upon  our 
results  we  anticipate  that  many  of  the  melanoma-reactive  CD8  T  cells 
expanded  in  this  protocol  may  have  expressed  CD103. 

In  conclusion,  we  have  demonstrated  that  the  malignant  ascites 
from  a  significant  proportion  of  EOC  patients  contains  a  high 
frequency  of  CD8+CD103+  T  cells,  which  bear  the  hallmarks  of 
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naturally  arising,  tumor-specific  lymphocytes.  These  cells  warrant 
further  investigation  to  enhance  our  understanding  of  the  immuno¬ 
biology,  prognosis  and  treatment  of  this  devastating  disease. 
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ABSTRACT 

Motivation:  T-cell  receptor  (TCR)  diversity  in  peripheral  blood 
has  not  yet  been  fully  profiled  with  sequence  level  resolution. 
Each  T-cell  clonotype  expresses  a  unique  receptor,  generated  by 
somatic  recombination  of  TCR  genes  and  the  enormous  potential  for 
T-cell  diversity  makes  repertoire  analysis  challenging.  We  developed 
a  sequencing  approach  and  assembly  software  (immuno-SSAKE  or 
iSSAKE)  for  profiling  T-cell  metagenomes  using  short  reads  from  the 
massively  parallel  sequencing  platforms. 

Results:  Models  of  sequence  diversity  for  the  TCR  (S-chain  CDR3 
region  were  built  using  empirical  data  and  used  to  simulate,  at 
random,  distinct  TCR  clonotypes  at  1-20p.p.m.  Using  simulated 
TCRfS  (sTCRfi)  sequences,  we  randomly  created  20  million  36  nt 
reads  having  1-2%  random  error,  20  million  42  or  50  nt  reads 
having  1%  random  error  and  20  million  36  nt  reads  with  1%  error 
modeled  on  real  short  read  data.  Reads  aligning  to  the  end  of 
known  TCR  variable  (V)  genes  and  having  consecutive  unmatched 
bases  in  the  adjacent  CDR3  were  used  to  seed  iSSAKE  de  novo 
assemblies  of  CDR3.  With  assembled  36  nt  reads,  we  detect  over 
51  %  and  63%  of  rare  (1  p.p.m.)  clonotypes  using  a  random  or 
modeled  error  distribution,  respectively.  We  detect  over  99%  of 
more  abundant  clonotypes  (6  p.p.m.  or  higher)  using  either  error 
distribution.  Longer  reads  improve  sensitivity,  with  assembled  42  and 
50  nt  reads  identifying  82.0%  and  94.7%  of  rare  1  p.p.m.  clonotypes, 
respectively.  Our  approach  illustrates  the  feasibility  of  complete 
profiling  of  the  TCR  repertoire  using  new  massively  parallel  short 
read  sequencing  technology. 

Availability:  ftp://ftp.bcgsc.ca/supplementary/iSSAKE 
Contact:  rwarren@bcgsc.ca 

Supplementary  information:  Supplementary  methods  and  data  are 
available  at  Bioinformatics  online. 


1  INTRODUCTION 

Recognition  of  MHC  (major  histocompatibility  complex)-presented 
antigen  by  the  T-cell  receptor  (TCR)  is  a  pivotal  process  in  cell- 
mediated  adaptive  immunity.  A  vast  TCR  repertoire  is  required 
to  recognize  the  enormous  diversity  of  potential  antigens  in  the 
environment.  TCRs  are  heterodimers  that  consist  predominantly 
(90-99%)  of  an  a  and  a  [5  subunit  (reviewed  in  Lefranc  and  Lefranc, 
2001),  the  remainder  consisting  of  y-8  heterodimers.  Each  chain 


-  To  whom  correspondence  should  be  addressed. 


(a  TCR  subunit  is  typically  referred  to  as  a  chain)  originates  from 
the  genetic  rearrangement  of  a  variable  (V),  joining  (J)  and  constant 

(C)  gene  segment  (Gascoigne  et  al.,  1984;  Hedrick  et  al.,  1984). 
Rearranged  TCRfl  DNA  also  includes  a  short  (12-16  nt)  diversity 

(D)  gene  segment  between  the  V  and  J  gene  (Fig.  1;  Kavaler 
et  al.,  1984).  At  the  molecular  level,  two  main  mechanisms 
contribute  to  generate  the  immense  TCR  sequence  repertoire.  Akin 
to  immunoglobulins,  the  combinatorial  diversity  of  TCR  arises  from 
the  genetic  rearrangement  of  V,  D  and  J  gene  segments  (Sakano  et  al., 
1979)  and  yields  ~5.8  x  106  possible  TCRotfl  gene  combinations 
(Janeway  et  al.,  2001).  Further  diversity  is  generated  during  this 
rearrangement  by  an  additional  mechanism  of  base  addition  and 
deletion  at  the  junction  of  V.  (D)  and  J  segments,  and  is  known 
as  the  N-diversity  (Huck  et  al.,  1988).  Addition  of  nucleotides  by 
terminal  deoxynucleotidyl  transferases  at  the  V-J  (a)  or  V-D-J  (fi) 
junction  (Landau  et  al.,  1984)  occurs  at  random  and  is  frequently 
preceded  by  base  deletion  at  the  3'  end  of  V,  the  5/  end  of  J  and  at  both 
ends  of  D.  This  junctional  diversity  alone  can  generate  ~2  x  1011 
distinct  molecules,  bringing  the  number  of  theoretically  possible 
TCRafl  to  ~1018  (Janeway  et  al.,  2001).  The  actual  number  of 
unique  T-cell  clonotypes  in  human  blood  is  at  least  ~107  (106(f- 
chains;  Arstila  et  al.,  1999).  The  amino  acids  encoded  at  the  V-(D)-J 
junction,  a  region  known  as  the  third  complementarity  determining 
region  (CDR3),  are  principally  responsible  for  antigen  recognition 
and  define  unique  TCR  clonotypes  (Gorski  et  al.,  1994).  Together, 
these  somatic  genome  alterations  create  a  diverse  T-cell  metagenome 
in  every  individual. 

Profiling  the  cellular  immune  response  to  immune  challenge 
by,  for  example,  vaccination,  transplantation,  infection  or 
cancer  provides  valuable  insights  into  immune  system  integrity 
and  function  and  the  efficacy  of  prophylactic  or  therapeutic 
interventions.  Unfortunately,  the  TCR  diversity  is  such  that 
complete  characterization  of  repertoires  still  represents  an  enormous 
challenge.  Current  profiling  methods,  developed  15  years  ago, 
analyze  TCR  (1-chain  repertoire  complexity  based  on  the  CDR3 
length  diversity  within  V(5  gene  families  (Gorski  et  al.,  1994; 
Pannetier  et  al.,  1993;  Penitente  et  al.,  2008).  Although  they 
provide  a  global  picture  of  the  repertoire,  these  low-resolution 
PCR-based  spectratypes  do  not  allow  specific  identification  and 
quantification  of  individual  T-cell  clonotypes.  DNA  sequencing 
achieves  higher  resolution,  but  large-scale  sequence  profiling  has 
been  infeasible  previously  due  to  cost.  For  instance,  sampling 
1  million  clonotypes  (i.e.  10-fold  coverage  of  1  million  1 50  nt  target 
CDR3  sequences  in  a  single  individual)  with  traditional  Sanger 
sequencing  would  cost  ~$1.5  M.  Newer  sequencing  technologies 
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Fig.  1.  Schematic  diagram  of  the  ~  1  Mb  human  TCRp  locus  on  chromosome 
7q34,  showing  the  combinatorial  gene  rearrangement  that  takes  place  and  the 
iSSAKE  strategy  for  assembling  CDR3  (inset).  The  TCRp  locus  comprises 
a  cluster  of  54  predicted  V  genes  located  distantly  from  two  separate  clusters 
each  with  one  D  and  C  gene,  interspersed  with  6  or  8  J  genes.  At  the  DNA 
level,  one  of  the  D  genes  recombines  with  one  of  the  J  segments,  creating 
partially  rearranged  DJ  genes.  Second,  one  of  the  V  genes  joins  DJ  and  the 
intermediary  DNA  is  deleted.  During  the  gene  rearrangements,  the  random 
base  addition  at  the  junction  of  V,  D  and  J  and  the  frequent  base  deletion 
at  the  y  end  of  V  and  5'  end  of  the  J  gene  yield  the  CDR3,  a  region  with 
unique  immune  specificities.  Read  assembly  is  preceded  by  the  segregation 
of  assembly  seeds  (arrow);  reads  that  align  to  the  3'  end  of  V  with  eight 
or  more  consecutive  unmatched  3'  bases.  A  possible  contiguous  sequence 
(contig)  resulting  from  that  strategy  is  shown,  with  the  CDR3-encoding 
region  highlighted  in  black. 


capable  of  producing  a  large  amount  of  short  reads  at  much  lower 
cost  have  emerged  in  recent  years  (Bennett,  2004;  Holt  and  Jones, 
2008;  Margulies  et  al.,  2005)  and  make  affordable  TCR  sequence 
profiling  a  likely  prospect.  Currently,  sampling  a  million  TCR 
clonotypes  with  the  Illumina  GAII  Analyzer  would  cost  ~  1000-fold 
less  compared  to  Sanger  sequencing.  On  the  flip  side,  next- 
generation  sequencing  technologies  have  much  shorter  read  lengths 
and  show  appreciable  base  error  (Holt  and  Jones,  2008).  Combined, 
these  limitations  pose  a  computational  challenge  for  the  accurate 
and  complete  sequence  reconstruction  of  specificity-determining 
regions. 

Using  randomly  generated  error-prone  short  reads  from  simulated 
TCRp  (sTCRP)  sequences,  we  have  developed  a  strategy  for 
profiling  T-cell  metagenomes.  The  method  uses  iSSAKE,  a  modified 
version  of  our  previously  published  short  read  assembler  SSAKE 
(Warren  et  al.,  2007),  and  relies  on  annotated  V|3  gene  predictions 
to  segregate  partial  3'  alignments  and  sort  corresponding  seed 
sequences  prior  to  assembly  (Fig.  1).  In  this  proof-of-principle 
study,  we  show  that  the  method  is  over  63%  sensitive  for  rare 
1  p.p.m.  clonotypes  and  over  91%  sensitive  for  clonotypes  as  low 
as  2  p.p.m.,  when  using  36  nt  reads  with  1%  randomly  distributed 
errors.  When  applying  a  modeled  error  distribution  to  simulated 
reads,  we  show  that  the  sensitivity  of  the  method  is  reduced  to  51% 
for  the  rarest  (1  p.p.m.)  clonotypes,  but  is  equally  sensitive  when 
clonotype  frequencies  are  above  5  p.p.m.  The  assembly  of  longer 
read  length  impacts  positively  on  the  sensitivity  of  the  method. 
For  instance,  the  method  is  nearly  12%  and  25%  more  sensitive  in 
recovering  1  p.p.m.  sTCRP,  when  using  42  nt  and  50  nt  long  reads 
compared  to  36  nt.  Together  with  high  base  accuracy  of  over  99%, 
we  show  that  the  majority  of  CDR3  sequences  can  be  reconstructed 
accurately  and  thus  characterized  using  error-rich  short  read  data. 


Table  1.  Frequency  if )  of  base  deletion  and  addition  at  the  CDR3  between 
publicly  available  mRNA  sequences  and  simulated  TCR(3 


/  deleted  3'  V  bases 

/  deleted  5'  J  bases 

/  added  CDR3  bases3 

Bases 

Observed 

Simulated 

Bases 

Observed 

Simulated 

Bases 

Observed 

Simulated 

(N  =  356) 

(N  =  220000) 

(N  =  1151) 

(N  =  220  000) 

(N  =  174) 

(N  =  220  000) 

0 

0.194 

0.200 

0 

0.209 

0.212 

1 

0.006 

0.007 

1 

0.160 

0.158 

1 

0.123 

0.123 

3 

0.006 

0.006 

2 

0.098 

0.098 

2 

0.122 

0.122 

4 

0.029 

0.031 

3 

0.118 

0.113 

3 

0.104 

0.105 

5 

0.017 

0.019 

4 

0.160 

0.155 

4 

0.117 

0.117 

6 

0.017 

0.019 

5 

0.118 

0.119 

5 

0.123 

0.119 

7 

0.052 

0.056 

6 

0.070 

0.073 

6 

0.086 

0.085 

8 

0.063 

0.067 

7 

0.045 

0.047 

7 

0.056 

0.057 

9 

0.069 

0.073 

8 

0.022 

0.023 

8 

0.031 

0.031 

10 

0.080 

0.084 

9 

0.008 

0.009 

9 

0.019 

0.019 

11 

0.057 

0.059 

10 

0.006 

0.006 

10 

0.010 

0.010 

12 

0.075 

0.076 

13  0.080  0.081 

14  0.086  0.085 

15  0.098  0.096 

16  0.046  0.046 

17  0.052  0.049 

18  0.029  0.028 

19  0.034  0.033 

20  0.023  0.021 

21  0.023  0.021 

22  0.011  0.010 

23  0.006  0.005 

25  0.011  0.010 

26  0.011  0.005 

27  0.006  0.005 

28  0.011  0.010 


aWe  did  not  observe  addition  of  2  or  24  nt  within  the  CDR3. 

2  METHODS 

2.1  Modeling  TCR|3  rearrangements 

From  the  alignments  of  predicted  V  and  J  genes  to  Genbank  TCR^  mRNA, 
four  independent  models  of  the  N-diversity  mechanisms  were  constructed, 
representing  the  frequencies  of  (i)  random  base  addition  at  the  V-D-J 
junction;  (ii)  base  composition  at  each  position  where  bases  were  added; 
(iii)  3'  V  base  deletion;  and  (iv)  5'  J  base  deletion.  The  models  are  intended 
as  a  guide  for  simulating  distinct  sequence  clonotypes  within  the  CDR3 
region  by  estimating,  using  empirical  data,  expected  frequencies  of  base 
addition,  deletion  and  composition  (Table  1).  These  models  were  used  to 
construct  sTCR(3  sequences  as  described  in  Supplementary  Material.  Briefly, 
this  involved,  (i)  randomly  selecting  the  V  and  J  gene  sequences;  (ii)  deleting 
3'  V  bases;  (iii)  deleting  5'  J  bases;  and  (iv)  joining  V-D-J  with  addition  of 
junction  bases. 

For  the  simulations,  we  used  only  the  ~150nt  of  simulated  sequence 
matching  ~40  nt  upstream  of  the  3'  end  of  V,  spanning  CDR3  and  J  and 
ending  ~50  nt  downstream  of  the  5'  end  of  C.  The  process  of  building  sTCRP 
was  repeated  to  generate  a  library  of  1  000  000  total  sequences  containing 
220  000  unique  sTCRp  sequences  at  frequencies  ranging  from  1  to  20  p.p.m. 
(Table  2). 

2.2  TCRP-CDR3  reconstruction  strategy 

From  the  above  1M  sTCRp  sequences,  we  randomly  generated,  in  three 
independent  replicate  experiments,  20  million  36  nt  reads  having  1.0,  1.5  or 
2.0%  randomly  distributed  errors  and  aligned  them  to  known  TCRp  gene 
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Table  2.  Generating  sTCR{3  clonotypes 


Clonotype 

frequency 

p.p.m. 

Number  of 
unique  sTCR{3 

Number  of  total 
sTCR(3 

Fold  coverage 
ca.a 

1:1000000 

1 

110  000 

110000 

5 

1:500  000 

2 

10000 

20000 

10 

1:333  333 

3 

10000 

30000 

15 

1:250  000 

4 

10000 

40000 

20 

1:200  000 

5 

10000 

50000 

25 

1:166  667 

6 

10000 

60000 

30 

1:142  857 

7 

10000 

70000 

35 

1:125  000 

8 

10000 

80000 

40 

1:111  111 

9 

10000 

90000 

45 

1:100  000 

10 

10000 

100000 

50 

1:66667 

15 

10000 

150000 

75 

1:50000 

20 

10000 

200000 

100 

a Approximate  coverage  calculated  using  150nt  sTCR£  template  size,  20M  36  nt  reads. 


segments.  In  addition,  we  also  generated  20M  42  or  50  nt  1%  error  reads 
to  assess  the  effect  of  read  length  on  assembly  and  tested  the  effect  of 
random  (Dohm  et  al.,  2007)  versus  modeled  (Using  MAQ  simutrain  and 
simulate  on  real  phiX174  Illumina  sequences;  Heng  et  al.,  2008)  1%  read 
error  distributions.  Simulated  reads  were  aligned  against  Ensembl  (Flicek 
et  al.,  2008)  TCR|3  gene  predictions  using  exonerate  (Slater  and  Bimey,  2005; 
Software  parameters  used:  -bestn  1  -score  1  -percent  0).  Reads  aligning  best 
to  V  genes,  at  the  3'  end  and  having  eight  or  more  consecutive  unmatched 
bases  in  3'  were  put  aside  as  seeds  for  a  de  novo  iSSAKE  assembly.  The 
reverse,  complemented  sequences  of  reads  aligning  on  the  reverse  strand 
of  V  predictions  were  also  considered  as  seeds.  The  assembly  read  pool 
consisted  of  unaligned  reads,  those  aligning  to  J  segments  best  and  assembly 
seeds.  Reads  aligning  best  to  V,  C  or  any  possible  JC  junction  sequence 
combinations  were  discarded  (Fig.  1). 

2.3  Targeted  seeded  assemblies  with  iSSAKE 

To  create  the  iSSAKE  assembler,  modifications  were  made  to  the  SSAKE 
v3.2.1  code  base  (Warren  et  al.,  2007;  http://www.bcgsc.ca/platform/bioinfo/ 
software/ssake).  Notably,  the  depth  of  the  prefix  tree  was  increased, 
augmenting  the  number  of  nodes  to  15.  This  modification  was  essential 
to  help  speed  the  assembly  process  at  the  cost  of  increased  memory 
requirements.  It  does  so  by  reducing  the  search  space  when  considering 
reads  for  extension.  This  modification  was  compulsory  since  there  is 
great  sequence  conservation  between  the  various  sTCR(3  CDR3  sequences, 
sometimes  differing  by  only  one  or  a  few  bases. 

Since  SSAKE  release  v2.0  (October  2007),  we  have  implemented  the 
approach  for  handling  error-rich  sequencing  data  described  in  Jeck  et  al. 
(2007).  In  essence,  all  overhanging  bases  of  reads  aligning  perfectly  to  a 
seed  sequence  are  considered  for  extension,  using  a  majority  rule  approach 
for  building  consensus  sequences  of  the  overhanging  bases. 

To  support  the  assembly  of  longer  contigs  with  complete  CDR3  without 
depleting  the  read  pool,  sequences  used  for  extension  are  re-used.  The 
assembly  terminates  only  when  all  seeds  have  been  maximally  extended.  This 
is  easily  parallelizable  on  a  cluster  of  computers  and  permits  the  assembly 
of  discrete  nontruncated  TCR(3  CDR3  sequences  ending  in  the  same 
J  segment.  Finally,  only  the  3'  extension  of  seeds  was  permitted,  the  assembly 
progressing  through  V,  D  and  J  in  this  order.  For  each  read  set,  we  ran  100 
parallel  iSSAKE  jobs  on  two  dozen  2.66  GHz  Quad-Core  64  bit  Intel® 
Xeon®  processors  with  15  GB  RAM  (iSSAKE  -m  15  -o  1  -r  0.6). 

3  RESULTS 

A  library  of  1  million  sTCRfl  was  generated  in  silico  using  models  of 
TCRfl  diversity  derived  from  publicly  available  mRNA  sequences. 
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unmatched  3‘  bases 

1  V,  C  &  JC  reads 
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Fig.  2.  TCRjl  CDR3  reconstruction  strategy.  Reads  are  aligned  against 
Ensembl  TC'Rfj  gene  predictions  using  exonerate  or  other  short  read  aligners. 
Reads  aligning  best  to  V  genes  at  the  3'  end  and  having  user  defined  n 
consecutive  unmatched  bases  in  3'  are  set  aside  as  seeds  for  a  de  novo 
iSSAKE  assembly.  The  reverse  complements  of  reads  aligning  on  the  reverse 
strand  are  also  considered  as  seeds.  The  read  pool  consists  of  unaligned  reads, 
those  aligning  J  genes  best  and  assembly  seeds.  Reads  aligning  best  to  V,  C 
or  any  possible  JC  junction  sequence  combinations  are  discarded  to  reduce 
sequence  space. 

The  library  consisted  of  220000  distinct  sequences  present  at 
frequencies  ranging  from  1  to  20p.p.m.  (110000  unique  1  p.p.m. 
sequences  and  10000  unique  2  to  20p.p.m.,  Table  2).  To  confirm 
that  sTCRjl  reflect  real  sequences,  we  kept  track  of  the  simulated 
N-diversity  changes  applied  to  each  sequence  and  verified  that  the 
frequencies  of  V  and  J  base  deletion  as  well  as  CDR3  base  addition 
in  the  sTCRp  library  were  consistent  with  the  frequencies  derived 
from  experimental  mRNA  sequences  (Table  1 ).  Twenty  million  reads 
having  1.0,  1.5  or  2.0%  random  or  1.0%  modeled  error  and  36, 
42  or  50  nt  in  length  were  randomly  sampled  from  the  sTCRfl 
library  and  assembled  in  separate  experiments.  The  approximate 
read  coverage  for  each  1-20  p.p.m.  clonotype  ranged  from  5-  to 
96-fold,  respectively  (Table  2). 

Since  we  expect  the  redundancy  of  short  reads  derived  from  V,  J 
and  C  to  be  extremely  high  and  the  redundancy  over  CDR3  to  be 
very  low,  a  classic  de  novo  assembly  where  all  sequence  reads  are 
used  in  turn  to  seed  a  contig  assembly  is  not  suitable.  However,  the 
sequences  of  human  TCRfl  genes  (V,  J  and  C)  are  known  and  well 
annotated,  which  makes  feasible  a  streamlined  strategy  of  seeded 
assembly.  The  assembly  seeds  we  use  are  sequences  that  align  to 
the  3'  end  of  V  with  eight  or  more  consecutive  unmatched  3'  bases 
in  the  highly  diverse  CDR3  region  (Fig.  1  inset  and  Fig.  2).  Using 
exonerate,  averages  of  1.755,  1.718,  1.679  and  1.599  million  seeds 
were  identified  from  sets  of  reads  with  random  1.0,  1.5,  2.0  and 
1.0%  modeled  error,  respectively  (Table  3).  The  decrease  in  number 
of  seeds  identified  at  higher  error  rates  or  between  random  and 
modeled  error  distribution  (4.3%  and  8.9%,  respectively)  is  due  to  an 
increased  number  of  mismatched  bases  that  prevent  read  alignment 
to  the  ?>’  end  of  the  V  gene.  Selecting  seeds  before  assembly  reduces 
the  sequence  space  by  ~90%  and  segregates  about  half  of  the 
20M  input  read  set  for  contig  assembly.  This  approach  considerably 
increases  the  assembly  speed  and  yields  only  contigs  that  represent 
the  CDR3. 

At  1%  randomly  distributed  error,  84.2  ±  0.16%  of  the  seeds,  on 
average,  yielded  contigs  that  comprised  complete  CDR3  sequences, 
including  D  segment  bases  and  unambiguous  J  segment  junctions. 
These  unambiguous  contigs  which  are  defined  as  having  clearly 
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Table  3.  sTCRf)  contig  stats  from  assemblies  of  20M  randomly  generated  36,  42  and  50  nt  reads 


Bases 

Error  (%)a 

Mean  seeds 

Number  of  iSSAKE  contigs  from  triplicates  (36  nt)  or  duplicates  (42  and  50  nt)  experiments 

Short  ambiguous15 

Long  ambiguous45 

Unambiguous41 

Unambiguous,  but  sub-optimale 

A 

B 

36 

1.0f 

1  599  140  ±961 

498  583  ±822 

36  352  ±261 

1064  399  ±400 

1098  ±185 

2847  ±89 

36 

1.0 

1755  437  ±404 

265  618  ±245 

11 520  ±500 

1478  299  ±236 

1 57  ±19 

569  ±5 

36 

1.5 

1718  470  ±890 

359  425  ±882 

16  460  ±325 

1342  585  ±1558 

228  ±44 

827  ±22 

36 

2.0 

1  678  905  ±1018 

441  317  ±712 

22018  ±307 

1  215570±385 

302  ±9 

1158  ±53 

42 

1.0 

2448  351  ±1298 

369  847  ±998 

21  956  ±315 

2  056  549  ±6 14 

239  ±14 

818  ±3 

50 

1.0 

3  119  789  ±1150 

350  076  ±578 

341  628  ±653 

2  428  086  ±2380 

174  ±12 

459  ±8 

A,  misassembled  contigs;  B,  contigs  having  five  or  more  mismatched  bases. 

aRandom  error  distribution  generated  using  simulators  from  Dohm  et  al.  (2007),  unless  otherwise  specified. 

^Too  short  to  unambiguously  decipher  J  and  thus.  CDR3. 

cContigs  >  45  nt,  sufficiently  long  to  contain  the  first  15  bases  of  J,  but  base  errors/polymorphisms  prevent  proper  identification  of  the  J  segment. 

^Captured  CDR3  and  first  15  bases  of  J  unambiguously. 

eMisassembled  contigs  are  defined  here  as  contigs  comprised  of  reads  that  belong  to  distinct  sTCRft.  They  are  identified  by  looking  at  discontinuity  in  the  sequence  alignment 
between  the  contigs  and  sTCRf).  Contigs  having  five  or  more  mismatches  bases  with  the  closest  sTCRfi  are  built  with  erroneous  reads  that  often  yield  misassembled  contigs. 
f  Error  distribution  modeled  using  phiX174  Illumina  sequence  data  as  the  training  set  (Heng  et  a/.,  2008). 


demarcated  V  and  J  boundaries  were  subsequently  trimmed  to  keep 
the  last  15  V  bases,  the  CDR3  and  the  first  15  bases  of  identifiable 
J  sequence.  The  reason  for  trimming  was  to  facilitate  assessment 
by  removing  bases  that  were  uninformative  for  characterization 
of  CDR3.  As  expected,  seeds  from  36  nt  read  sets  having  higher 
error  rates  or  errors  modeled  on  Illumina  data  yield  fewer 
unambiguous  contigs,  with  average  proportions  of  78.1  ±0.25% 
and  72.4  ±0.10%  for  the  1.5%  and  2%  random  error  sets  and 
66.6  ±  0.03%  for  the  modeled  error  read  set.  respectively  (Table  3). 
This  is  because  random  base  errors  in  the  seed  sequences  cause 
premature  termination  of  contig  extension  by  iSSAKE,  unless  one 
or  more  reads  in  the  pool  has  matching  erroneous  bases  by  chance. 
The  latter  case  can  lead  to  (i)  misassembled  contigs,  especially  if 
different  sTCRf)  have  a  very  similar  CDR3  makeup  and  (ii)  long 
ambiguous  contigs  where  J  segments  are  undecipherable. 

Misassemblies  are  identified  as  contigs  that  do  not  match  sTCRp 
in  the  source  library.  These  were  rarely  observed  (0.01-0.02%  of 
unambiguous  contigs),  although  more  prevalent  in  the  2%  error 
read  set  (Table  3).  The  effect  of  error  on  contig  misassemblies 
is  more  pronounced  when  error  is  modeled  from  real  Illumina 
data  (Table  3),  where  error  rates  tend  to  increase  toward  the  end 
of  the  read.  However,  even  with  a  modeled  error  distribution, 
misassemblies  still  represent  a  minor  proportion  of  all  unambiguous 
reconstructions  (0.3%). 

We  define  long  ambiguous  contigs  as  those  large  enough  to 
contain  J  segment  bases,  but  because  of  base  errors,  there  was  not 
a  precisely  matching  J  segment.  An  increase  from  1%  to  2%  in 
the  error  rate  nearly  doubles  the  number  of  long  ambiguous  contigs, 
increasing  their  proportion  from  0.7%  to  1.3%.  These  contigs,  while 
ambiguous,  were  still  useful  for  assessing  the  sensitivity  of  our 
method.  Their  abundance  is  not  negligible  and  the  contigs  still 
produce  valid  alignments  to  a  reference.  With  real  data,  identifying 
the  CDR3  from  these  contigs  without  a  reference  sequence  will 
prove  more  challenging.  Short  ambiguous  contigs  are  defined  here 
as  those  that  are  not  long  enough  to  span  CDR3.  Short  ambiguous 
contigs  are  caused  by  early  termination  of  contig  extension  due  to 


base  errors.  These  short  ambiguous  contigs  represent  a  considerable 
portion  of  the  total  contigs  (15%  ±0.01,  21%  ±0.04,  26%  ±0.05 
and  31%  ±  0.05  of  assemblies  using  1.0,  1.5,  2.0%  random  and  1% 
modeled  error  read  sets,  respectively). 

For  each  assembly,  the  average  base  accuracy  was  calculated 
by  counting  the  total  number  of  matching  bases  over  the  aligned 
contig  length.  Although  base  accuracy  of  assembled  contigs  is  lower 
when  simulated  sequence  error  rates  are  higher,  it  is  above  99%  at 
all  clonotype  frequencies  and  error  rates  simulated  (Tables  4-6). 
Contigs  representing  clonotypes  with  the  lowest  frequencies  were 
the  least  accurate.  This  is  not  unexpected  since  at  lower  read  depths, 
there  are  fewer  reads  to  offset  the  base  error,  especially  in  the  highly 
diverse  and  thus  relatively  thinly  covered  CDR3  region.  Effectively, 
inspection  of  the  base  error  and  coverage  of  assemblies  as  a  function 
of  base  position  over  the  region  of  interest  reveals  that  base  mismatch 
frequency  peaks  within  the  seed  portion  (any  of  the  last  15  V 
bases  and  at  least  8  consecutive  mismatched  bases  downstream) 
and  decreases  through  J  as  the  base  coverage  increases  (Fig.  3). 

At  clonotype  frequencies  as  low  as  3  p.p.m.,  over  93%  of  the 
sTCRp  CDR3  sequences  could  be  characterized  by  iSSAKE  contigs 
assembled  from  the  1  %  modeled  error  distribution  read  set  (Table  4). 
This  means  that  the  sTCRf)  sequence  diversity  can  be  almost  entirely 
characterized  at  15  x  coverage  (Table  2).  Although  the  scope  of  real 
T-cell  diversity  remains  unknown,  if  it  is  close  to  the  estimated  lower 
limit  of  106  fTchains,  then  substantial  repertoire  coverage  should  be 
easy  to  attain  by  massively  parallel  short  read  sequencing,  even 
without  trimming  the  reads. 

For  all  contigs  that  capture  sTCRf)  sequences  we  find  the  accuracy 
to  be  very  high,  especially  for  clonotypes  present  at  5  p.p.m.  or 
more.  Interestingly,  we  find  that  read  error  has  only  a  small  impact 
on  accuracy  at  these  clonotype  frequencies.  Seeds  with  errors  will 
rarely  find  a  sequence  match  in  iSSAKE,  causing  premature  contig 
extension  or  leading  to  an  increased  number  of  singlets,  depleting 
the  pool  of  unambiguous  contigs.  Thus,  early  rejection  of  these  reads 
has  a  much  more  significant  impact  on  the  sensitivity  than  it  does 
on  the  accuracy. 
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Table  4. 

Method  sensitivity  and  accuracy 

as  a  function  of  a  1%  read  error  distribution  (random  or  modeled) 

p.p.m. 

Number  of  sTCR£-CDR3 

Accuracy  (%) 

Average  number  of 

characterized  by  iSSAKE 

contigs  characterizing 

contigs  (sensitivity) 

each  sTCRp 

Error  (%) 

Error  (%) 

Error  (%) 

1 .0  Random 

1.0  Modeled 

1 .0  Random 

1 .0  Modeled 

1 .0  Random 

1.0  Modeled 

1 

70  240 

56564 

99.68 

99.01 

2.0 

1.8 

2 

9111 

8100 

99.90 

99.34 

3.7 

2.5 

3 

9747 

9295 

99.96 

99.64 

4.6 

3.4 

4 

9883 

9721 

99.98 

99.80 

6.0 

4.4 

5 

9932 

9874 

99.99 

99.90 

7.6 

5.5 

6 

9936 

9913 

99.99 

99.94 

9.1 

6.6 

7 

9935 

9936 

99.99 

99.96 

10.6 

7.7 

8 

9939 

9948 

99.99 

99.97 

12.2 

8.9 

9 

9948 

9955 

99.98 

99.97 

13.7 

10.0 

10 

9956 

9958 

99.99 

99.98 

15.2 

11.2 

15 

9972 

9975 

99.99 

99.98 

23.0 

16.8 

20 

9955 

9958 

99.98 

99.98 

30.7 

22.1 

Unambiguous  and  long  ambiguous  contigs  were  used  for  this  analysis.  Reported  values  are  the  mean  of  triplicate  simulations.  Variation  among  simulations  was  minimal 
(Supplementary  Table  1). 


Table  5.  Method  sensitivity  and  accuracy  as  a  function  of  randomly 
distributed  error  rates 


p.p.m. 

Number  of  sTCRP-CDR3  Accuracy  (%) 
characterized  by  iSSAKE 
contigs  (sensitivity) 

Average  number  of 
contigs  characterizing 
each  sTCRp 

Error  (%) 

Error  (%) 

Error  (%) 

1.5 

2.0 

1.5 

2.0 

1.5 

2.0 

1 

64  862 

59259 

99.48 

99.23 

1.9 

1.8 

2 

8779 

8423 

99.80 

99.68 

2.9 

2.6 

3 

9656 

9520 

99.92 

99.85 

4.1 

3.7 

4 

9869 

9831 

99.96 

99.93 

5.4 

4.9 

5 

9929 

9920 

99.98 

99.97 

6.9 

6.2 

6 

9936 

9928 

99.98 

99.98 

8.2 

7.5 

7 

9934 

9924 

99.98 

99.97 

9.7 

8.7 

8 

9940 

9929 

99.98 

99.98 

11.0 

10.1 

9 

9952 

9944 

99.98 

99.98 

12.5 

11.3 

10 

9953 

9940 

99.99 

99.98 

13.9 

12.6 

15 

9971 

9966 

99.99 

99.98 

21.1 

19.3 

20 

9961 

9944 

99.98 

99.98 

28.2 

26.0 

Unambiguous  and  long  ambiguous  contigs  were  used  for  this  analysis.  Reported  values 
are  the  mean  of  triplicate  simulations.  Variation  among  simulations  was  minimal 
(Supplementary  Table  1). 


Again,  it  is  important  that  in  real  sequence  data,  errors  tend  to 
accumulate  toward  the  3'  ends  of  reads,  rather  than  being  equally 
distributed  along  the  length  of  the  read.  Using  error  distribution 
modeled  on  real  data,  we  see  fewer  contigs  reconstructed  at 
eachp.p.m.,  due  to  fewer  seeds  being  initially  identified  and  more 
frequent  seed  extension  failures  (Table  4).  However,  at  least  for 
clonotype  frequencies  >5  p.p.m.,  reconstruction  success  rate  is  high 
and  largely  unaffected  by  error  distribution.  At  lower  clonotype 


Table  6.  Method  sensitivity  and  accuracy  as  a  function  of  read  length 


p.p.m. 

Number  of  sTCR|3-CDR3  Accuracy  (%) 
characterized  by  iSSAKE 
contigs  (sensitivity) 

Average  number  of 
contigs  characterizing 
each  sTCR(3 

Read  length  (nt) 

Read  length  (nt) 

Read  length  (nt) 

42 

50 

42 

50 

42 

50 

1 

90210 

104  155 

99.74 

99.75 

2.4 

2.9 

2 

9737 

9914 

99.93 

99.95 

4.3 

5.4 

3 

9911 

9959 

99.98 

99.98 

6.4 

8.0 

4 

9937 

9963 

99.99 

99.98 

8.5 

10.7 

5 

9948 

9966 

99.99 

99.98 

10.7 

13.2 

6 

9944 

9966 

99.98 

99.98 

12.8 

15.8 

7 

9941 

9974 

99.98 

99.97 

14.8 

18.3 

8 

9948 

9975 

99.98 

99.97 

17.0 

20.8 

9 

9954 

9979 

99.98 

99.98 

19.1 

23.2 

10 

9960 

9983 

99.98 

99.98 

21.1 

25.7 

15 

9973 

9985 

99.99 

99.98 

31.1 

37.2 

20 

9958 

9976 

99.98 

99.98 

40.0 

47.4 

Unambiguous  and  long  ambiguous  contigs  were  used  for  this  analysis.  Reported  values 
are  the  mean  of  duplicate  (42  and  50  nt  reads)  simulations  at  1.0%  randomly  distributed 
errors.  Variation  among  simulations  was  minimal  (Supplementary  Table  1). 


frequencies,  the  reconstruction  rate  is  lower;  for  instance,  63.8% 
versus  51.5%  of  1  p.p.m.  sTCRfl  can  be  identified  when  using  a 
random  versus  a  modeled  error  distribution,  respectively  (Table  4). 

A  change  of  only  1%  in  the  read  base  error  (from  1%  to  2%) 
yields  a  66%  increase  in  contigs  too  short  to  characterize  sTCRfl 
unambiguously,  usually  because  the  J  segment  is  incomplete  and/or 
its  position  cannot  be  identified  with  certainty.  This  translates  into 
a  decreased  sensitivity  of  the  method  of  over  10%  at  1  p.p.m.. 


Profiling  model  T-cell  metagenomes 
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Fig.  3.  Average  mismatched  base  frequency  and  mean  contig  base  coverage 
per  position  on  trimmed,  normalized,  unambiguous  contigs  from  triplicate 
36  nt  1  %  random  error  read  assemblies.  Assembly  base  mismatch  frequency 
(i.e.  assembly  errors)  reaches  a  maximum  within  the  seed  portion  (any  of  the 
last  15V  bases  and  at  least  eight  consecutive  mismatched  bases  downstream) 
and  decreases  through  J  as  the  base  coverage  increases.  The  sudden  increase 
in  average  fold  coverage  beginning  at  approximately  base  position  30  is 
explained  by  over  sampling  of  a  limited  number  (14)  of  J  segments,  and  the 
re-use  of  reads  by  the  iSSAKE  assembly  algorithm.  The  position  of  the  V 
segment  and  approximate  position  of  the  CDR3  and  J  gene  segments  on  the 
contigs  is  shown  on  top  of  the  graph  and  is  depicted  by  the  rectangles.  Every 
contig  is  comprised  of  the  last  and  first  15  nt  of  the  V  and  J  gene  segment, 
respectively.  The  CDR3  and  J  gene  segment  boundaries  are  approximate 
because  the  length  of  the  CDR3  varies. 

7%  at  2p.p.m.  and  2%  at  3p.p.m.  At  higher  clonotype  frequencies 
(>4p.p.m.),  the  effect  of  base  error  on  yielding  short  contigs  is  offset 
by  the  larger  read  depth  (Table  5). 

The  robustness  of  assemblies  of  higher  frequency  clonotypes 
is  further  enhanced  because  more  seeds  are  available  at  the  start 
of  assembly.  When  assembled,  these  seeds  should,  in  theory,  lead 
to  contigs  that  characterize  the  same  TCRfl.  Keeping  track  of  the 
average  number  of  contigs  that  characterize  each  sTCRf)  generated 
allows  one  to  estimate  the  frequency  of  any  given  sTCRp  in  the 
sample.  Consistently,  at  the  error  rates  tested,  there  is  an  almost 
perfect  PEARSON  correlation  (0.9998,  0.9997  and  0.9994  at  1,  1.5 
and  2%  random  error,  respectively,  and  0.9995  for  the  1%  modeled 
error  set)  between  the  average  number  of  sTCRp-capturing  contigs 
and  the  frequency  of  that  sTCRj).  Since  the  number  of  seeds  (and 
thus,  contigs)  identified  per  TCRfl  varies  linearly  in  function  of  read 
coverage,  as  opposed  to  having  a  1 : 1  relationship  with  the  clonotype 
frequency,  the  number  of  contigs  identified  cannot  be  expected  to 
reflect  the  exact  clonality  of  each  TCRp  in  the  sample.  Instead,  the 
contig  count  may  be  used  to  estimate  relative  TCRfl  abundance. 

To  explore  the  effect  of  read  length,  we  also  simulated  sets  of 
42  and  50  nt  x  20M  reads  at  1%  random  error.  These  read  lengths 
and  error  rates  should  be  achievable  by  massively  parallel  short  read 
platforms,  if  not  currently,  then  in  the  near  future.  Increasing  the  read 
length  has  a  drastic  effect  on  the  sensitivity  of  the  method.  Detection 
of  sTCRf)  increases  by  18%  at  1  p.p.m.  when  42  nt  1%  error  reads 
are  assembled  compared  to  shorter  36  nt  reads  (Table  6).  Using  50  nt 
reads  for  assembly  recovers  over  94.7%  of  clonotypes,  an  increase 
of  >30%  in  detection  compared  to  the  usage  of  36  nt  reads  with  the 


same  error  rate.  At  2  p.p.m.,  the  sensitivity  of  the  recovery  increases 
from  91.1%  to  97.4%  to  99.1%,  using  1%  error  36,  42  and  50  nt 
reads,  respectively.  Increased  sensitivity  is  a  direct  consequence  of 
obtaining  more  seeds.  With  42  and  50  nt  reads,  40%  and  77%  more 
g1  seed  sequences  could  be  identified  front  our  set  of  20M  simulated 
reads  (Table  3). 

o 

o 

2 

»  4  DISCUSSION  AND  CONCLUSION 

ro 

w 

Technological  advances  in  sequencing  (Holt  and  lones,  2008) 
put  large-scale  high-resolution  TCR  profiling  within  the  realm 
of  possibility.  However,  shortcomings  of  these  new  sequencing 
technologies,  namely  the  appreciable  sequencing  errors  and  short 
read  lengths,  require  computational  solutions  to  help  make  sense  of 
the  data.  We  have  explored  the  feasibility  of  using  short  36,  42  and 
50  nt  error-prone  sequences  to  characterize  up  to  1  million  sTCRf) 
sequences.  Our  strategy  for  reconstructing  sTCRp  relies  on  two 
bioinformatics  pillars:  short  read  sequence  alignment  and  seeded 
de  novo  assembly.  Unidirectional  de  novo  assemblies  of  short  seeds 
targeting  the  V-D-J  junction  is  made  possible  using  a  modified 
version  of  SSAKE  (Warren  et  al.,  2007)  that  handles  sequencing 
errors,  re-use  reads  and  processes  k- mers  more  rapidly  than  earlier 
versions  (http://www.bcgsc.ca/platform/bioinfo/software/ssake). 
This  strategy  is  tailored  for  very  short  reads,  such  as  those  produced 
by  the  Illumina  Ltd.  sequencing  instrument,  and  constitute  the  main 
theoretical  advance  presented  in  this  article. 

Sequence  characterization  of  TCRs  and  more  specifically  the 
variable  portion  encoding  amino  acids  that  directly  interact  with 
antigenic  peptide  permits  the  identification  of  disease-associated 
T-cells.  Current  TCR  sequence  profiling  can  at  best  decipher 
hundreds  of  TCRs  (Ozawa  et  al.,  2008;  Zhou  et  al.,  2006),  a  small 
number  in  comparison  with  the  107  TCR  diversity  estimated  in  an 
individual  (Arstila  et  al.,  1999).  Larger-scale  profiling  techniques 
that  examine  CDR3  length  heterogeneity  provide  a  global  snapshot 
of  TCR  repertoires,  but  do  not  resolve  individual  clonotypes  at  the 
macromolecular  level  (Gorski  et  al.,  1994;  Pannetier  et  al.,  1993; 
Penitente  et  al.,  2008).  Due  to  the  low  throughput,  high  cost  and  labor 
requirements  of  traditional  Sanger  sequencing,  sequence-profiling 
TCR  on  that  same  scale  has  not  yet  been  explored,  thereby  providing 
the  impetus  for  our  study. 

The  success  of  TCR  sequence  reconstruction  using  short 
sequences  relies  on  the  very  region  that  makes  profiling  the  TCR 
repertoire  challenging;  the  uniqueness  and  specificity  of  the  CDR3 
(Davis  et  al.,  1998).  Selection  of  seed  sequences  that  comprise  bases 
encoding  a  portion  of  the  variable  region  ensures  that  a  streamlined, 
unidirectional  assembly  proceeding  through  the  junction  will  help 
characterize  unique  clones.  This  is  especially  true  if  the  sequence 
coverage  is  10-fold  or  above,  or  the  frequency  of  the  TCR  is 
high,  since  higher  frequencies  result  in  higher  sequence  coverage  of 
discrete  TCRs.  At  low  frequencies,  base  error  has  a  strong  negative 
impact  on  TCR  reconstruction  rates  that  is  due  to  sequence  coverage 
insufficient  to  offset  base  error  in  less  redundant  CDR3-encoding 
regions.  iSSAKE  will  not  extend  a  seed  or  contig  with  a  base  error 
in  a  minimum  set  overlap  region,  unless  that  base  can  be  found  at 
the  same  position  in  an  overlapping  A-mer.  This  impacts  favorably 
on  contig  accuracy  at  the  expense  of  reconstruction  rates,  especially 
at  low  1  p.p.m.  frequencies  and  2%  error. 

We  examined  instances  of  failure  to  detect  CDR3  sequences 
known  to  exist  in  our  sTCRfl  library.  The  majority  (99.4%  using 
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36  nt  read  sets)  of  irresolvable  low-frequency  sTCRf)  are  attributable 
to  3'  base  errors.  The  problem  is  exacerbated  for  low-frequency 
clones  because  these  by  definition  have  lower  coverage  and  therefore 
less  chance  for  an  error  to  be  mitigated  by  read  redundancy.  At  all 
clonotype  frequencies,  but  more  noticeably  at  higher  sequence 
coverage,  0.3%  of  irresolvable  sTCRf)  are  due  to  high  sequence 
identity  between  modeled  TCRf),  sometimes  differing  only  by  a  few 
5/  V  bases  and  thereby  preventing  unambiguous  characterization  of 
their  sequence.  We  see  additional  failure  modes  that  are  very  rare 
(~0.2-0.3%  of  irresolvable  sTCRp,  0.004%  of  total  sTCRp)  and 
associated  with  shorter  (36  and  42  nt)  reads  and  higher  frequency 
clonotypes  (>5  p.p.m.).  For  example,  reads  originating  from  CDR3 
may  by  chance  align  well  to  V  segments,  C  segments  or  JC  junctions, 
and  will  as  a  consequence  be  removed  early  in  the  assembly  process 
(Fig.  2)  such  that  CDR3’s  containing  these  sequences  cannot  be 
assembled  despite  high  read  coverage.  This  failure  mode  was  not 
observed  with  longer  50  nt  seeds  and  is  explained  by  the  increased 
ability  of  seeds  (which  are  never  discarded)  to  span  the  entire  CDR3 
and  capture  a  portion  of  J  in  a  single  read. 

In  time,  accurate  sequence  length  from  next  generation 
sequencing  platforms  will  exceed  the  length  of  CDR3.  However, 
sequence  assembly  will  remain  advantageous  because  it  mitigates 
the  effect  of  sequence  errors  present  in  individual  reads.  In 
the  present  study,  fewer  CDR3  sequences  could  be  identified 
unambiguously  with  the  unassembled  50  nt  read  set  compared  to 
the  assembled  one  (533  868  versus  2  428  086  unambiguous  contigs, 
Supplementary  Table  1).  Already,  a  typical  Illumina  Sequence 
Analyzer  run  will  output  more  bases  than  we  generated  in  this  study, 
at  equal  or  lower  error  rates.  This  suggests  that  the  overall  strategy, 
as  outlined,  will  work  with  sequence  data  from  biological  samples. 
We  expect  it  will  also  be  applicable  to  similar  metagenomics 
projects  including  sequence-characterization  of  the  immunoglobulin 
repertoire. 
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T-cell  receptor  (TCR)  genomic  loci  undergo  somatic  V(D)J  recombination,  plus  the  addition  /  subtraction  of  nontemplated 
bases  at  recombination  junctions,  in  order  to  generate  the  repertoire  of  structurally  diverse  T  cells  necessary  for  antigen 
recognition.  TCR  beta  subunits  can  be  unambiguously  identified  by  their  hypervariable  CDR3  (Complement  Determining 
Region  3)  sequence.  This  is  the  site  of  V(D)J  recombination  encoding  the  principal  site  of  antigen  contact.  The  complexity 
and  dynamics  of  the  T-cell  repertoire  remain  unknown  because  the  potential  repertoire  size  has  made  conventional 
sequence  analysis  intractable.  Here,  we  use  5'-RACE,  Illumina  sequencing,  and  a  novel  short  read  assembly  strategy  to 
sample  CDR3p  diversity  in  human  T  lymphocytes  from  peripheral  blood.  Assembly  of  40.5  million  short  reads  identified 
33,664  distinct  TCRP  donotypes  and  provides  precise  measurements  of  CDR3P  length  diversity,  usage  of  nontemplated 
bases,  sequence  convergence,  and  preferences  for  77?BV(T-ceII  receptor  beta  variable  gene)  and  TRB]  (T-cell  receptor  beta 
joining  gene)  gene  usage  and  pairing.  CDR3  length  between  conserved  residues  of  TRBV  and  TRB]  ranged  from  21  to  81 
nucleotides  (lit).  TRBV  gene  usage  ranged  from  0.01%  for  TRBVI7  to  24.6%  for  TRBV20-I.  TRB]  gene  usage  ranged  from  1.6% 
for  TRBJ2-6  to  17.2%  for  TRBJ2-1.  We  identified  1573  examples  of  convergence  where  the  same  amino  acid  translation  was 
specified  by  distinct  CDR3p  nucleotide  sequences.  Direct  sequence-based  immunoprofiling  will  likely  prove  to  be  a  useful 
tool  for  understanding  repertoire  dynamics  in  response  to  immune  challenge,  without  a  priori  knowledge  of  antigen. 

[Supplemental  material  is  available  online  at  http://www.genome.org.  The  TCRP  cDNA  sequence  and  quality-score  files  have  been 
submitted  to  the  NCB1  Short  Read  Archive  (http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi)  under  accession  no.  SRA008633.] 


T-cell  receptors  (TCRs)  are  dimeric  (api  or  78),  highly  variable  T 
lymphocyte  membrane  proteins  that  recognize  antigenic  peptides 
presented  on  heterologous  cells  by  the  major  histocompatibility 
complex  (MHC)  (Davis  and  Bjorkman  1988;  Bassing  et  al.  2002). 
Recognition  specificity  for  diverse  peptide-MHC  (pMHC)  com¬ 
plexes  is  provided  by  the  three  complementarity-determining 
regions  (CDRs)  of  the  TCR.  CDR1  and  CDR2  are  coded  for  by 
germline  sequences  while  CDR3,  the  highly  polymorphic  princi¬ 
pal  recognition  site,  is  created  when  TCR  genomic  loci  undergo 
somatic  recombination  between  gene  segments  during  devel¬ 
opment  of  T  lymphocytes  in  the  thymus  (Gellert  1992,  2002;  Jung 
and  Alt  2004).  For  the  a  locus  and  the  7  locus,  recombination 
occurs  between  variable  (V)  and  joining  (J)  segments.  For  the  8 
locus  and  the  p  locus,  there  is  recombination  between  V  and  J 
segments,  but  also  the  inclusion  of  one  of  two  short  diversity  (D) 
segments.  The  combinatorial  diversity  of  the  human  (3  locus  is  il¬ 
lustrated  in  Figure  1A.  At  CDR3  recombination  junctions,  further 
complexity  is  generated  through  the  deletion  of  germline-encoded 
bases  and  the  addition  of  random  nontemplated  bases.  The 
resulting  hypervariable  sequences  of  the  CDR3  make  possible  the 
recognition  of  diverse  peptide-MHC  (pMHC)  complexes.  During 
T-cell  maturation,  all  T  cells  expressing  rearranged  receptors  capa¬ 
ble  of  binding  pMHC  with  high  enough  affinity  to  be  biologically 
relevant  are  retained  (positive  selection),  but  only  T  cells  with 
rearranged  receptors  that  do  not  interact  strongly  with  self-pMHC 
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complexes  ultimately  exit  the  thymus  (negative  selection).  It 
should  be  noted  that  V(D)J  recombination  is  not  entirely  random, 
and  the  prevalence  of  specific  gene  segments  and  combinations  of 
gene  segments  shows  marked  variation  in  the  repertoire.  Con¬ 
tributions  to  this  bias  are  introduced  even  before  thymic  selection, 
through  variation  in  the  efficiency  of  recombination  of  different 
gene  segments  (Manfras  et  al.  1999;  Krangel  2003).  The  peripheral 
blood  thus  contains  a  large  repertoire  of  T  lymphocytes  with  the 
potential  to  recognize  diverse  antigens.  Binding  of  a  naive  T  cell's 
TCR  to  a  structurally  compatible  pMHC  on  an  antigen-presenting 
cell  will,  with  the  appropriate  interaction  of  co-stimulatory  mole¬ 
cules,  initiate  rapid  clonal  expansion  to  generate  a  population  of 
effector  cells.  This  acute  response  occurs  on  the  order  of  days,  and 
is  followed  by  a  gradual  contraction  of  the  expanded  pool  over  the 
course  of  several  weeks,  with  differentiation  into  a  small  number  of 
long-lived  memory  cells.  Thus,  the  T-cell  repertoire  is  not  static, 
but  rather  is  constantly  molded  by  immune  challenge  (for  reviews, 
see  Nikolich-Zugich  et  al.  2004;  Harty  and  Badovinac  2008). 

There  has  been  remarkable  progress  in  characterizing  the 
size  and  dynamics  of  the  T-cell  repertoire,  but  the  task  remains 
daunting  due  to  the  enormous  combinatorial  diversity  that  is 
theoretically  possible  (>1015  distinct  a(3  receptors,  or  clonotypes 
[Davis  and  Bjorkman  1988;  Murphy  et  al.  2007])  and  the  limited 
power  of  existing  tools  for  interrogation.  Previously,  a  method 
called  TCR  spectratyping  (Pannetier  et  al.  1993;  Gorski  et  al.  1994) 
had  been  used  to  probe  the  T-cell  repertoire.  This  approach 
involves  the  use  of  V  and  J  gene  segment-specific  primers  for  RT- 
PCR  amplification  of  the  CDR3.  In  TCR  spectratyping,  CDR3 
amplicons  are  separated  according  to  size  by  polyacrylamide  gel 
electrophoresis.  Typically,  six  or  so  distinct  amplicons  are  observed 
per  primer  pair,  spaced  at  3-nucleotide  (nt)  intervals  in  accordance 
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A  B 

First  strand  synthesis  using  Gene-Specific  TRBC  primer 
(Template-Switching  Primer  included  for  5'  anchor  sequence) 

i 

PCR  with  Template-Switching  Primer  and  nested  TRBC  primer 


PCR  with  nested  Not  I  •  tailed  primers  Sanger  Sequence 

l 

Not  I  digestion,  concatenation,  shearing 

i 

Gel  purification  of  100-300  bp  fraction;  Ligation  of  lllumina  Adapters 

i 

PCR  with  lllumina  Primers:  lllumina  Single-End  Sequencing 

Figure  1.  ( A )  Representation  of  the  TCRp  locus  at  human  chromosome  7q34.  The  TCRP  locus  spans  620  kb  and  includes  over  50  TRBV genes  (green) 
belonging  to  30  subgroups.  There  are  two  TRBC  genes  (light  blue)  each  downstream  from  a  TRBD  (dark  blue)  and  six  or  seven  TRB/s  (yellow).  Re¬ 
combination  first  occurs  between  TRBI  and  TRBD  genes,  followed  by  recombination  to  a  TRBV  gene.  (Red  lines)  Addition  of  nontemplated  bases.  After 
transcription,  intervening  sequences  are  spliced  out  so  that  a  TRBC  is  adjacent  to  the  recombined  V-D-J  sequence.  Gene  width  and  distances  are  not  to 
scale.  (Red  asterisks  under  TRBC )  Location  of  primers  used  for  5 '-RACE  and  PCR  reactions.  Refer  to  Supplemental  Figure  1 A  for  primer  locations.  A  detailed 
locus  map  can  be  obtained  from  IMGT  (www.imgt.org/textes/IMGTrepertoire).  (fi)  Flowchart  illustrating  5'-RACE  and  lllumina  library  construction.  For 
more  detail,  please  refer  to  Methods  and  Supplemental  Figure  1 . 

with  reading  frame.  An  experimental  estimate  of  repertoire  size 
of  ~106  beta  chains  in  blood  has  been  obtained  (Arstila  et  al.  1999) 
by  exhaustive  Sanger  sequencing  of  a  single  amplicon  from  a 
TRBV18-TRBJ1-4  spectratype,  then  extrapolating  the  observed  di¬ 
versity  according  to  the  relative  abundance  of  this  amplicon  in  the 
spectratype  and  the  estimated  frequency  of  TRBV18-TRBJ1-4 
pairing  in  the  repertoire.  Of  course,  actual  TCR  diversity  will  be 
higher  still,  due  to  ct(3  heterodimerization  (Fuschiotti  et  al.  2007; 

Ozawa  et  al.  2008). 

Advances  in  sequencing  technology  (Holt  and  Jones  2008; 

Shendure  and  Ji  2008)  now  permit  interrogation  of  complex 
sequencing  targets  at  unprecedented  depth  and  reasonable  cost. 

Here,  we  describe  a  method  for  deep  sampling  of  the  TCR  re¬ 
pertoire  at  sequence-level  resolution.  Our  approach  relies  on 
massively  parallel  lllumina  sequencing  of  CDR3p  amplification 
products  and  a  novel  TCR-specific  short  read  assembly  strategy 
(Warren  et  al.  2009). 

Results 

Experimental  strategy 

We  used  5'  rapid  amplification  of  cDNA  ends  (RACE)  to  obtain 
CDR3p  transcript  sequences  from  a  commercially  available  mRNA 
sample  prepared  from  normal  human  peripheral  blood  leukocytes 
(PBL)  pooled  from  550  individuals  (Fig.  IB;  Supplemental  Fig.  1). 

Peripheral  blood  from  different  individuals  will  include  different 
frequencies  of  naive  and  memory  T  cells.  Because  individual  mem¬ 
ory  repertoires  are  skewed  due  to  historical  antigen  encounter  and 
the  individual's  HLA  type,  our  results  do  not  reflect  the  expected 
repertoire  of  any  individual,  but  rather  are  reflective  of  average 
clonotype  abundance  in  a  population. 

The  RACE  approach  avoids  the  potential  bias  associated  with 
the  use  of  the  multiple  primer  sets  required  to  amplify  from  all 
TRBV  sequences  (Boria  et  al.  2008)  and  takes  advantage  of  the 
conserved  sequences  offered  by  TRBC1  and  TRBC2  (96%  nucleo¬ 
tide  sequence  identity).  Reverse  transcription  to  generate  cDNA 


was  performed  using  a  primer  specific  for  the  TRBC  genes  (Ozawa 
et  al.  2008)  as  well  as  a  template-switching  primer  (Peters  et  al. 
1999;  Douek  et  al.  2002)  to  provide  a  5'  anchor  for  subsequent 
PCR.  First-round  PCR  reactions  with  a  nested  TRBC  primer  and  the 
template-switching  primer  produced  a  high  level  of  background 
amplification.  A  second  round  of  PCR  using  nested  primers  was 
performed  to  obtain  a  cleaner  product  of  —520  bp.  (See  Methods 
for  primer  sequences  and  Supplemental  Fig.  1A  for  TRBC  primer 
locations.)  The  RACE  product  was  then  gel-purified  and  an  aliquot 
was  cloned  and  Sanger  sequenced  to  confirm  the  presence  of 
CDR3p  amplicons.  The  RACE  product  was  too  long  to  directly  se¬ 
quence  the  CDR3p  region  with  short-read  technology,  so  it  was 
ligated  to  produce  concatamers  that  were  then  sheared  by  soni- 
cation.  A  100-  to  300-bp  size  fraction  was  isolated  by  PAGE 
and  shotgun-sequenced  on  the  lllumina  platform  (www.illumina. 
com).  The  initial  sequencing  runs  generated  18,829,563  36-nt 
reads.  During  the  course  of  this  analysis,  a  protocol  to  produce 
longer  read  lengths  became  available,  so  further  21,752,666  50-nt 
reads  were  generated  and  analysis  was  performed  on  the  pooled  set 
of  40,582,229  reads  (Table  1). 

iSSAKE  assembly  and  analysis  of  reconstructed 
TCRp  sequences 

We  have  recently  described  a  system  for  profiling  TCR  diversity 
using  short  sequence  reads  and  the  assembly  software  package  we 
call  iSSAKE  (immuno-Short  Sequence  Assembly  by  K-mer  search 


Table  1.  Sequencing  and  assembly  statistics 


Total  reads  40,582,229 

Seed  sequences  310,614 

Total  CDR3p  sequences  assembled3  1 1  7,052 

Total  clonotypes  (distinct  CDR3|3  sequences)  33,664 

Clonotypes  with  an  unambiguous  TftBPsegment  22,704 


aComplete  CDR3p  sequences  in  correct  reading  frame. 
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and  3'  read  Extension;  Warren  et  al.  2009).  Sequence  reads  that 
have  homology  with  a  TRBV  gene  segment  but  have  unmatched 
bases  at  their  ends  (corresponding  to  the  beginning  of  the  re¬ 
combined  CDR3p  sequence)  are  used  as  seeds  to  initiate  direc¬ 
tional,  de  novo  CDR3p  assemblies,  as  described  in  the  Methods 
section.  Briefly,  with  iSSAKE,  reads  from  the  shotgun  data  set  de¬ 
rived  from  the  CDR3p  amplicon  are  optimally  aligned  to  each  seed 
to  generate  CDR3p  sequence  contigs.  An  important  feature  of 
iSSAKE  is  that  reads  can  be  reused,  which  allows  the  assembly  of 
different  CDR3p  sequences  that  end  in  the  same  TRBJ  seg¬ 
ment.  However,  this  also  means  that  read  depth  is  not  uniform 
throughout  CDR3p  and  cannot  be  used  to  determine  clonotype 
frequency.  Depth  is  exceptionally  high  over  the  TRBJ  segment, 
where  only  14  TRBJ  possibilities  exist.  Some  seeds  are  themselves 
long  enough  to  cover  most  or  all  of  the  CDR3p-encoded  bases 
without  being  significantly  extended  by  other  reads,  so  read  depth 
in  these  instances  may  be  low.  For  the  current  study,  we  set  the 
assembly  parameters  to  only  extend  a  contig  when  two  or  more 
reads  aligned  at  each  base  position  (— o  2).  Further,  only  reads  with 
overlaps  15  bases  or  longer  (-m  15)  were  considered.  Read  error 
was  mitigated  using  the  iSSAKE  internal  error-handling  algorithm 
and  consensus  bases  were  called  when  bases  from  reads  agreed 
70%  of  the  time  or  more  (— r  0. 7).  The  output  comprises  contiguous 
sequences  (contigs)  that  contain  the  last  15  nt  of  TRBV,  any  non- 
templated  and  TRBD  bases,  and  the  first  15  recognizable  TRBJ 
segment  bases.  Upon  completion  of  assembly,  contigs  are  com¬ 
pared  and  those  that  have  matching  sequence,  and  therefore  rep¬ 
resent  the  same  beta-chain  clonotype,  are  grouped  together.  The 
number  of  matching  contigs  for  a  given  clonotype  provides  the 
relative  frequency  of  that  clonotype  in  the  original  sample.  From 
previous  simulations  we  know  that  contig  depth  determined  in 
this  manner  is  proportional  to  clonotype  frequency  (correlation 
coefficient  >0.999;  Warren  et  al.  2009). 

The  results  of  the  analysis  of  the  sequence  data  generated  in 
the  present  study  are  outlined  in  Table  1.  From  the  complete  data 
set  of  >40.5  M  total  reads  we  identified  310,614  assembly  seeds. 
From  these  seeds,  CDR3p  sequences  were  assembled  for  35,762 
distinct  TCR  beta-chain  clonotypes.  The  sequence  of  each  clono¬ 
type  is  provided  in  Supplemental  File  1.  A  large  proportion  of  seeds 
did  not  yield  additional  distinct  clonotypes,  as  there  was  not  ad¬ 


equate  sequence  coverage  of  our  sample  to  extend  these  seeds  into 
TRBJ.  Clonotype  sequences  were  screened  for  open  reading  frames 
(ORFs),  and  2098  (5.9%)  that  were  found  to  contain  stop  codons  in 
frame  with  TRBJ  were  removed,  leaving  33,664  distinct  beta-chain 
clonotypes.  The  relatively  high  proportion  of  clonotypes  con¬ 
taining  stop  codons  does  not  appear  to  be  due  to  sequencing  error 
or  misassembly.  Rather,  we  find  that  96%  of  these  stop  codons  map 
to  nontemplated  bases  at  the  V-D-J  junction  and  likely  represent 
real  events  captured  by  our  assay.  During  T-cell  maturation,  in  the 
event  that  the  first  T-cell  receptor  beta  chain  (TCRp)  rearrangement 
in  a  given  cell  is  nonproductive,  rearrangement  of  the  second 
TCRp  allele  is  initiated.  However,  after  thymic  selection,  down- 
regulation  of  the  nonproductive  allele  is  not  absolute  (LI  and 
Wilkinson  1998),  and  these  transcripts  may  account  for  the  pre¬ 
mature  termination  codons  we  identify  in  the  present  study. 

Clonotype  abundance  varied  from  one  to  a  maximum  of  279 
(Fig.  2A).  Rare  clonotypes  detected  as  single  copies  represented 
65 .3%  of  all  clonotypes  but  only  18.8%  of  the  1 1 7,052  total  CDR3P 
sequence  assemblies.  Moderately  abundant  clonotypes  detected  at 
a  copy  number  between  two  and  19  represented  32.6%  of  clono¬ 
types  and  64.1%  of  assemblies.  Finally,  there  were  720  clonotypes 
(2.1%  of  clonotypes)  with  copy  number  >20,  and  this  small 
number  of  highly  abundant  clonotypes  represented  17.1%  of  all 
assemblies.  Since  the  data  are  generated  from  RNA  pooled  from 
many  individuals,  we  expect  that  the  majority  of  clonotypes  will 
originate  from  the  more  prevalent  effector  and  memory  cells  of  the 
population  sampled.  It  is  possible  that  the  most  abundant  clono¬ 
types  therefore  represent  highly  expanded  effector  cells  from  those 
individuals  with  a  recent  antigen  encounter.  Additionally,  some 
abundant  clonotypes  may  exemplify  the  phenomenon  of  public 
T-cell  responses,  that  is,  identical  TCR  rearrangements  from  multi¬ 
ple  individuals  in  response  to  the  same  antigen  (Venturi  et  al.  2008). 

To  determine  if  the  depth  of  sequencing  in  the  current  study 
showed  any  trend  toward  saturation,  we  took  random  subsamples 
of  sequences  at  intervals  of  5  million  reads.  These  were  inde¬ 
pendently  assembled  and  the  number  of  distinct  clonotypes  at 
each  point  was  plotted  (Fig.  2B).  The  relationship  is  linear  (Pearson 
coefficient  =  0.999),  indicating  we  have  not  begun  to  approach 
saturation.  This  is  expected,  given  the  fact  that  we  obtained 
33,664  distinct  clonotypes  from  our  sequencing  target  of  pooled 


Figure  2.  (4)  TCRP  diversity.  A  total  of  33,664  TCRp  clonotypes  were  identified  from  complete  and  in-frame  CDR3p  sequences  assembled  by  iSSAKE. 
Clonotypes  with  a  copy  number  of  one  (clonotypes  identified  by  a  single  iSSAKE  contig)  account  for  65.3%  of  all  clonotypes.  Clonotypes  identified  by  two 
to  19  iSSAKE  contigs  account  for  32.6%  of  all  clonotypes,  and  high-abundance  clonotypes  (contig  depth  £20)  account  for  2.1  %  of  the  total.  (6)  Saturation 
analysis.  In  duplicate  experiments  we  chose  independent  sets  of  5,  1 0,  1 5,  20,  30,  and  35  M  reads  at  random  from  the  pool  of  40,582,229  total  sequence 
reads  in  our  data  set.  These  subsets  of  reads  were  assembled  and  clonotypes  counted  as  the  set  of  complete,  in-frame,  nonredundant  CDR3p  sequences. 
The  number  of  clonotypes  (mean  ±  SD  for  the  duplicate  experiments)  is  plotted  as  a  function  of  the  number  of  reads.  Error  bars  are  contained  within  the 
symbols. 
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peripheral  blood  mononuclear  cells  (PBMCs),  and,  as  noted  above, 
the  repertoire  size  of  just  one  individual  has  been  estimated  pre¬ 
viously  to  be  ~106  beta  chains. 

TRBV,  TRBD,  and  TRB]  usage 

With  our  experimental  approach,  only  the  portion  of  the  TRBV 
sequence  that  is  present  in  the  seed  sequence  is  available  for  as¬ 
signment  of  a  particular  TRBV  gene  to  an  assembled  CDR3p.  This 
fact,  together  with  the  often  high  sequence  homology  among 
certain  TRBV  genes  and  the  replacement  of  deleted  TRBV  ends 
with  nontemplated  bases  makes  it  impossible  to  assign  a  single 
unique  TRBV  gene  to  every  clonotype.  We  were  successful,  how¬ 
ever,  in  making  an  unambiguous  assignment  for  22,704  (67.4%)  of 
the  assembled  clonotypes  (Table  1;  Supplemental  Fig.  2)  to  one  of 
49  distinct  TBRV  genes.  Subsequent  analysis  is  based  on  this  por¬ 
tion  of  the  data  set.  Usage  of  this  set  of  49  TBRV  genes  ranged  from 
24.6%  for  TRBV20-1  to  0.01%  for  TRBV17  (Fig.  3A). 

Several  TRBV  genes  identified  in  our  assembly  are  ORFs  (open 
reading  frame,  International  ImMunoGeneTics  [IMGT]  nomen¬ 
clature)  that  either  have  changes  at  conserved  amino  acid  posi¬ 
tions  (TRBV6-7,  TRBV17,  and  TRBV7-1)  or  noncanonical  splice 
donor  sites  ( TRBV5-3  and  TRBV23-1).  We  also  found  assemblies 
containing  the  pseudogene  TRBV21-1,  which  has  a  frameshift  in 
the  leader  sequence.  As  with  the  appearance  of  rearrangements 
that  contain  stop  codons  (see  above),  it  is  possible  that  these  as¬ 
semblies  represent  the  transcription  of  an  initial  nonproductive 
rearrangement.  TRBV21-1  and  TRBV23-1  have  previously  been 
shown  to  be  transcribed  (summarized  by  Folch  and  Lefranc 
2000),  and  for  TRBV23-1,  expression  has  also  been  demonstrated 
at  the  protein  level  (Leslie  et  al.  2006).  The  transcription  of  TRBV7- 
1  is  unexpected,  as  it  is  also  deficient  in  sequences  essential  for 
recombination.  It  is  possible  that  functional  alleles  of  TRBV7-1 
exist. 

All  known,  functional  TRB J  genes  are  represented  in  the  se¬ 
quence  assembly  and  can  be  assigned  unambiguously  within  the 
full  set  of  33,664  potential  clonotype  sequences.  Usage  ranged 
from  17.2%  for  TRBJ2-1  to  1.6%  for  TRBJ2-6  (Fig.  3B).  The  pseu¬ 
dogene  TRBJ2-2P  was  not  detected. 

TRBD  segments  sustain  substantial  base  deletion  and  overall 
transformation,  so  that  the  segments  are  usually  unrecognizable 
without  ambiguity.  Clonotypes  where  TRBD  could  be  identified 
unambiguously  and  accurately  (with  minimum  length  =  8  nt)  rep¬ 
resent  5497  out  of  the  22,704  sequences  with  accurate  TRBV  as¬ 


signment.  For  these  5597  clonotypes,  we  calculate  that  49.9%  are 
TRBD1  and  50.1%  are  TRBD2. 

From  our  set  of  49  positively  identified  TRBV  and  13  TRB/, 
there  are  637  potential  pairings.  We  find  that  562  of  these  TRBV- 
TRBJ  combinations  are  represented  in  our  data  for  the  22,704 
unique  clonotypes  (Fig.  4;  Supplemental  Fig.  3;  Supplemental  Ta¬ 
ble  1).  The  most  frequent  pairing  is  of  TRBV20-1  to  TRBJ2-1,  ac¬ 
counting  for  4.1%  of  all  pairings,  whereas  58  TRBV-TRBJ pairs  were 
identified  only  once. 

Sequence  diversity  of  CDR3P 

We  examined  the  frequency  of  base  addition  and  deletion  at  the 
V-D-J  junction.  The  boundary  of  the  CDR3p  is  not  absolute.  In 
order  to  compare  sequences,  we  defined  CDR3P  coordinates  as 
starting  at  the  codon  for  the  last  cysteine  of  TRB  Uand  ending  at  the 
phenylalanine  in  the  conserved  TRBJ  segment  motif  FGXG.  In  our 
data  set  this  defines  a  subset  of  30,366  CDR3P  sequences.  The 
length  of  CDR3  varies  from  21  to  81  nt  with  a  peak  at  45  nt  (Fig. 
5A).  While  most  rearrangements  involve  the  removal  and  addition 
of  a  few  residues,  the  extent  of  change  can  be  considerable.  Non¬ 
templated  bases  and/or  deletions  were  detected  in  all  D-J  junctions 
examined  and  in  only  two  instances  was  there  no  net  change  in 
sequence  at  the  V-D  junction.  Nontemplated  bases  at  the  V-D 
junction  are  62.9%  GC  and  the  nontemplated  bases  at  the  J-D 
junction  are  54.3%  GC.  The  7319  CDR3p  sequences  contained  in 
the  45-nt  peak  were  used  to  create  nucleotide  and  amino  acid  se¬ 
quence  logos  (Fig.  5B,C).  The  logos  are  a  graphical  representation 
of  a  nucleic  acid  or  amino  acid  multiple  sequence  alignment 
(Schneider  and  Stephens  1990;  Crooks  et  al.  2004).  We  find  no 
evidence  of  any  overrepresented  sequence  in  CDR3p  other  than 
the  prevalence  of  guanines  in  the  center  of  the  nucleic  acid  logo 
(Fig.  5B)  and  glycines  in  the  center  of  the  amino  acid  logo  (Fig.  5C), 
which  simply  reflect  the  sequence  and  coding  potential  of  the 
guanine-rich  TRBD  segments.  The  conservation  apparent  at  the 
left  and  right  ends  of  the  logos  reflects  the  contribution  of  TRBV 
and  TRBJ  sequences,  respectively.  Finally,  in  our  sequence  assem¬ 
blies,  there  are  many  examples  of  independent  recombination 
events  that  have  produced  the  same  CDR3P  amino  acid  sequence. 
In  659  instances,  the  same  CDR3P  nucleotide  sequence  is  detected 
in  association  with  different  TRBV  and  TRBJ  sequences.  In  addi¬ 
tion,  we  find  1573  examples  of  rearrangements  where  the  same 
CDR3p  amino  acid  sequence  can  be  translated  from  divergent 
nucleotide  sequences. 


TRBJ  segments 


Figure  3.  TRBV  and  TRBJ  usage.  (4)  Relative  frequency  of  usage  of  TRBV  segments  was  for  the  subset  of  clonotypes  with  an  unambiguous  TRBV  gene 
segment  assignment  ( n  =  22,704).  (8)  Relative  frequency  of  usage  of  TRBJ  segments  for  the  set  of  all  clonotypes  (n  =  33,664). 
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Sequence  profiling  the  T-cell  repertoire 


Figure  4.  Frequencies  of  V-J  pairing  calculated  from  the  subset  of  clo- 
notypes  with  an  unambiguous  TRBV  gene  segment  assignment  (n  = 
22,704).  (Blue  to  purple  rectangular  bands)  TRBJ  segments,  (red  to  cyan 
rectangular  bands)  TRBV segments.  The  width  of  the  bands  is  proportional 
to  the  number  of  times  the  TRBV  and  TRBJ  genes  connected  by  the  band 
co-occur  in  CDR3p  sequences.  TRBV  and  TRBj  segments  are  arranged  from 
left  to  right  and  right  to  left,  respectively,  and  ordered  by  total  pairing  links 
they  share.  (This  illustrates  the  data  contained  in  Supplemental  Table  1 .) 
This  figure  was  generated  using  the  Circos  software  package  (Krzywinski 
et  al.  2009). 


Verification 

Validation  of  the  assembly  and  analysis  was  obtained  from  the 
intersection  of  iSSAKE-generated  data  with  that  obtained  by 
Sanger  sequencing  of  5'-RACE  clones.  A  total  of  288  unique  TCR 
sequences  were  obtained  by  Sanger  sequencing  of  the  cloned 
5 '-RACE  products,  and,  of  these,  69%  were  found  in  our  assembled 
Illumina  data.  In  addition,  comparison  of  the  usage  frequencies  of 
TRBJ  genes  from  the  iSSAKE  contigs  and  the  Sanger  sequences 
shows  a  very  similar  profile  (Pearson  coefficient  0.904,  data  not 
shown)  and  provides  a  validation  of  the  iSSAKE  sequence  assem¬ 
bly.  Comparison  of  the  frequencies  for  TRBV  was  limited  to  those 
49  genes  that  could  be  unambiguously  identified  from  the  Illu¬ 
mina  assembly  (Pearson  coefficient  0.822).  Next,  to  estimate  se¬ 
quence  error  attributable  to  PCR  and  reverse  transcription,  we  used 
Sanger  sequencing  to  evaluate  TCRp  5 '-RACE  products  amplified 
and  cloned  from  the  cell  line  DES  M26,  a  melanoma-specific  T-cell 
line  that  carries  a  single  TCRP  rearrangement.  Single-pass  sequence 
reads  from  311  independent  colonies  were  quality-trimmed  to  Q50 
(representing  the  highest  quality  subset  with  only  one  predicted 
error  per  100,000  bases)  and  assembled,  demonstrating  an  accu¬ 
racy  of  99.91%.  This  sets  a  practical  upper  limit  on  the  contribu¬ 
tion  of  reverse  transcriptase-  and  polymerase-introduced  errors  in 
the  method  as  it  is  described  here. 

Discussion 

By  massively  parallel  sequencing  of  CDR3P  amplicons  from  pooled 
leukocytes,  we  have  identified  33,664  distinct  human  CDR3|j 
sequences.  As  of  May  2009  there  were  only  3187  unique  human 
TCRp  mRNA  sequences  in  GenBank,  not  all  of  which  include  the 


CDR3p  region.  The  IMGT  database  reports  5303  rearranged  human 
TCRp  sequences,  2927  of  which  are  shared  with  GenBank.  Thus,  in 
a  single  experiment,  we  have  increased  the  number  of  known 
sequences  by  an  order  of  magnitude.  Analysis  of  the  data  from  the 
present  study  has  provided  information,  at  unprecedented  pre¬ 
cision,  on  many  fundamental  TCRp  properties  such  as  preferences 
for  nucleotide  removal  and  addition  at  recombination  junctions 
(Table  2)  and  the  extent  of  CDR3p  length  diversity  (Fig.  5A).  As 
expected  from  all  previous  studies,  we  see  that  certain  TRBV  and 
TRBJ  genes  are  commonly  utilized  while  others  are  quite  rare  (Fig. 
3A,B)  and  the  pairing  of  TRBV  and  TRBJ  is  not  random  (Fig.  4; 
Supplemental  Table  1;  Rosenberg  et  al.  1992;  Even  et  al.  1995;  Hall 
and  Lanchbury  1995;  Roldan  et  al.  1995;  Manfras  et  al.  1999).  The 
reasons  for  bias  are  not  clearly  understood  but  are  likely  due  to 
a  combination  of  proximity  effects  and  recombination  signal  se¬ 
quence  compatibilities  that  influence  initial  TCR  development, 
plus  thymic  selection  and  immune  challenge  that  modify  the 
representation  of  selected  clones  in  the  extant  repertoire  (Krangel 
2003).  It  must  be  emphasized  that  the  results  presented  here  for 
TRBV  and  TRBJ  frequency  and  pairing  are  obtained  from  a  biased 
sample,  where  the  individual  repertoires  of  subjects  contributing 
to  the  pool  have  been  skewed  by  antigen  encounter  and  the 
individual's  HLA  type.  These  results  cannot  be  taken  to  represent 
the  innate  TRBV  and  TRBJ  usage  and  pairing  preferences  of  the 

A 


CDR3p  length  (nt) 


N  ^  C 

Figure  5.  CDR3P  nucleotide  length  distribution  and  sequence  compo¬ 
sition  of  the  most  abundant  CDR3P  length.  To  explore  CDR3P  length 
variation  we  used  a  precise  length  criterion,  defined  as  all  bases  between 
the  last  cysteine  of  TRBV  and  the  phenylalanine  in  the  TRBJ  segment  motif 
FGXG.  Of  the  33,664  total  clonotypes,  30,366  could  be  classified  in  this 
manner,  and  the  length  distribution  of  this  subset  was  plotted  (4).  The 
most  frequently  observed  length  was  45  nt.  For  the  subset  of  clonotypes 
with  45  nt  CDR3p  sequences,  we  created  logos  for  the  nucleotide  (B)  and 
inferred  amino  acid  (C)  composition,  using  WebLogo  (Crooks  et  al.  2004). 
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Table  2.  Frequency  of  base  addition  and  deletion  at  the  TCRp  V-D-)  junction 


Bases 

Deleted  3  -V 
bases 

Added  V-D 
bases 

Deleted  5  -D 
bases3 

Deleted  3  -D 
bases3 

Added  D-J 
bases 

Deleted  5  -J 
bases 

0 

0.2384 

0.1625 

0.3231 

0.2401 

0.1246 

0.1197 

1 

0.1629 

0.1179 

0.1683 

0.1621 

0.1035 

0.0867 

2 

0.1008 

0.1466 

0.1299 

0.2432 

0.1457 

0.1037 

3 

0.1159 

0.1403 

0.0746 

0.2072 

0.1466 

0.0930 

4 

0.1359 

0.1330 

0.1077 

0.0631 

0.1257 

0.1253 

5 

0.1044 

0.0817 

0.0555 

0.0355 

0.0975 

0.1143 

6 

0.0743 

0.0635 

0.0504 

0.0202 

0.0793 

0.0897 

7 

0.0391 

0.0480 

0.0444 

0.0133 

0.0389 

0.0734 

8 

0.0152 

0.0357 

0.0462 

0.0153 

0.0369 

0.0548 

9 

0.0061 

0.0233 

0.0227 

0.0327 

10 

0.0043 

0.0184 

0.0133 

0.0207 

11 

0.0022 

0.0073 

0.0164 

0.0131 

12 

0.0004 

0.0073 

0.0076 

0.0101 

13 

4.40  X  1 0~05 

0.0071 

0.0135 

0.0095 

14 

0.0033 

0.0053 

0.0072 

15 

0.0013 

0.0060 

0.0139 

16 

0.0009 

0.0045 

0.0059 

17 

0.0005 

0.0045 

0.0077 

18 

0.0002 

0.0029 

0.0065 

19 

0.0004 

0.0020 

0.0032 

20 

0.0002 

0.0009 

0.0027 

21 

na 

0.0005 

0.0019 

22 

0.0009 

0.0002 

0.0013 

23 

0.0017 

24 

0.0008 

25 

0.0005 

aD  segments  (D1:  CCCACAGGCCCC,  D2:  GGGACTAGCGGG[AG]GGG)  were  trimmed  1  bp  at  a 
time  and  used  for  the  search  until  D  =  8  nt. 


torical  antigen  encounters,  and  current  res¬ 
ponses  to  acute  infection. 

There  is  considerable  utility  in  our 
method  of  TCR  sequence  profiling  (or  its 
adaptation  to  immunoglobulin  sequencing) 
even  in  the  absence  of  heroic  sequencing 
depth,  since  clonotypes  that  respond  to  a 
given  immune  challenge  (effector  and  sub¬ 
sequently  memory  cells)  will  be  present  in  the 
peripheral  blood  in  higher  copy  number. 
There  are  many  types  of  immune  challenge, 
for  example,  acute  and  latent  infections, 
autoimmunity,  organ  transplantation,  and 
vaccination  against  infectious  agents  and 
malignancies.  In  these  scenarios,  and  with¬ 
out  a  priori  knowledge  of  the  antigen,  se¬ 
quencing  prospective  samples  to  moderate 
depth  should  reveal  responsive  clonotypes. 
Additional  applications  may  include  the 
evaluation  of  immune  reconstitution  fol¬ 
lowing,  for  example,  bone  marrow  trans¬ 
plant  or  the  initiation  of  highly  active 
antiretroviral  therapy  (HAART)  for  HIV  in¬ 
fection.  It  should  also  be  possible  to  more 
readily  identify  the  public  T-cell  clonotypes 
associated  with  infectious  agents  or  tumor 
neoantigens  and  correlate  these  with  effec¬ 
tive  immune  responses. 


V(D)J  recombination  process  during  T-cell  development.  Further,  it 
is  possible  that  preferential  PCR  amplification  of  certain  TRBV and 
TRB]  sequences  over  others  has  skewed  the  usage  frequencies 
reported  here.  We  have  attempted  to  mitigate  this  by  using 
5 '  -RACE,  rather  than  TRB  V-specific  primers,  in  order  to  reduce  bias 
from  differential  primer  annealing  or  variable  amplicon  lengths. 
The  uniformity  of  the  CDR3  length  distribution  presented  in  Fig¬ 
ure  5  suggests  that  length  bias  has  not  been  an  issue.  However, 
future  studies  with  independent  replicates  from  multiple  individ¬ 
uals  may  be  informative  regarding  other,  unanticipated  sources  of 
bias. 

To  differentiate  the  bias  in  TRBV  and  TRB ]  representation  that 
is  incurred  during  V(D)J  recombination  and  thymic  selection  from 
bias  that  is  cased  by  antigen  encounter  in  the  periphery,  it  should 
be  possible  to  sort  and  independently  profile  the  naive  (CD25~, 
CD44~,  CD45RA+,  CD62L+)  and  memory /effector  (CD25+,  CD44+, 
CD45RO+,  CD62L~)  peripheral  T-cell  compartments.  However, 
biases  that  are  strictly  due  to  recombination  can  only  be  delineated 
by  profiling  rearranged  but  pre-selected  double-negative  (CD4-, 
CD8-)  cells  from  the  thymic  cortex.  Studies  of  this  nature  would 
be  most  tractable  in  mouse. 

Individual  TCR  diversity  has  been  estimated  as  ~106  beta 
chains  (Arstila  et  al.  1999).  This  estimate  relied  on  the  calculation 
that  TRBV18  to  TRBJ1-4  pairing  would  occur  at  a  frequency  of 
0.00024  in  the  repertoire.  Our  findings  do  not  conflict  with  this 
assumption.  We  see  pairing  of  TRBV18  and  TRBJ1-4  in  4  of  22,704 
unique  clonotypes  (which  represents  a  frequency  of  0.00018). 
However,  at  the  present  time  we  cannot  provide  insight  into  in¬ 
dividual  repertoire  size  because  our  sample  is  derived  from  blood 
pooled  from  multiple  individuals.  In  fact,  there  may  not  be 
a  "typical"  individual  peripheral  blood  T-cell  repertoire,  given  that 
repertoires  are  skewed  by  many  factors,  including  HLA  type,  his- 


Methods 

5-RACE 

Peripheral  leukocyte  polyA+  RNA  isolated  from  165  L  of  peripheral 
blood  pooled  from  380  males  (ages  18-40)  and  170  females  (ages 
18-40)  was  obtained  from  a  commercial  supplier  (Clontech 
#636170).  First-strand  cDNA  was  synthesized  using  a  published 
TRBC  primer  ( 5 ' >CACGTGGTCGGGGWAGAAGC<3 ' )  (Ozawa  et  al. 
2008).  A  target-switching  oligo  (5'>AAGCAGTGGTAACAACGCA 
GAGTACGCGGG<3')  (Peters  et  al.  1999)  was  added  to  provide  a  5' 
template  for  RACE  (reaction  conditions:  100  ng  of  RNA,  oligonu¬ 
cleotides  1  each,  2  mM  DTT,  1  mM  each  dNTP,  25  mM  Tris-HCl 
pH  8.3,  37.5  mM  KC1, 1.5  mM  MgCl2,  and  400  units  of  Superscript 
II  [Invitrogen]  in  a  20-(jlL  volume.  Extension  was  90  min  at  42°C 
followed  by  inactivation  for  7  min  at  72°C.).  A  control  reaction 
with  no  enzyme  was  included  to  ensure  that  subsequent 
PCR  products  were  the  result  of  amplification  from  a  reverse- 
transcribed  template.  PCR  was  performed  using  Platinum  Pfx 
(Invitrogen)  and  0.5  p,L  of  first-strand  reactions  with  the  target¬ 
switching  oligonucleotide  (above)  and  a  nested  TRBC  primer  (5'> 
TGGTGCGGCCGCTCTCTGCTTCTGATGGCTCAAAC<3 ')  tailed 
with  a  Notl  restriction  site  (reaction  conditions:  1  unit  of  enzyme, 
2x  Pfx  amplification  buffer,  1  mM  MgS04,  oligonucleotides 
0.3  pM  each,  and  0.3  mM  each  dNTP  in  a  50-p,L  volume;  2  min 
denaturation  at  94°C  was  followed  by  30  cycles  of  30  sec  at  94°C, 
30  sec  at  55°C,  and  45  sec  at  68°C,  plus  a  final  extension  for  5  min 
at  68°C).  In  order  to  obtain  a  cleaner  product,  PCR  was  performed 
on  0.1  p.L  of  the  first-round  reaction  with  a  nested  target-switching 
oligonucleotide  (5  '>AGTTGCGGCCGCTAACAACGCAGAGTACG 
CGGG<3'),  and  an  equimolar  combination  of  two  primers  (5'>CA 
CAGCGGCCGCGGGTGGGAACACCTTGTTCAGGT<3'),  and  (5'> 
CACAGCGGCCGCGGGTGGGAACACGTTTTTCAGGT<3')  specific 
for  TRBC1  and  TRBC2,  respectively  (reaction  conditions:  1  unit  of 
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enzyme,  2x  Pfx  amplification  buffer,  1  mM  MgS04,  0.3  |j.M  oli¬ 
gonucleotides,  and  0.3  mM  each  dNTP  in  a  50-(jlL  volume;  2  min 
denaturation  at  94°C  was  followed  by  20  cycles  of  30  sec  at  94°C 
and  75  sec  at  68°C,  plus  a  final  extension  for  5  min  at  68°C.).  The 
nested  PCR  reaction  was  loaded  on  a  12%  polyacrylamide  gel  and 
the  band  centered  at  520  bp  was  visualized  using  SYBR  Green 
(Lonza),  excised,  and  processed  for  sequencing  (below). 

Preparation  of  5'-RACE  products  for  Iliumina  sequencing 

Eight  nested  PCR  reactions  were  pooled  and  purified  using  20  (jiL  of 
QIAEX  II  matrix  (Qiagen).  The  eluate  was  digested  with  Notl  (re¬ 
action  conditions:  4.5  /xg  of  DNA,  50  units  of  Notl  [New  England 
Biolabs],  lx  NEB3  Buffer  in  150  jjiL  volume,  22  h  at  37°C),  and  the 
band  centered  at  520  bp  was  purified  from  a  12%  polyacrylamide 
gel.  The  fragment  was  then  concatenated  by  ligation  (reaction 
conditions:  500  ng  of  DNA,  lx  NEB  T4  DNA  ligase  buffer,  200 
units  of  T4  DNA  ligase  [New  England  Biolabs]  in  a  5-jjiL  volume) 
and  stored  at  4°C.  Prior  to  sonication,  the  ligation  product  was 
cleaned  with  QAIEX II  and  eluted  in  a  20-^.L  volume.  After  20  min 
sonication,  the  sample  was  loaded  on  a  8%  polyacrylamide  gel,  and 
the  fraction  from  100-300  bp  was  excised,  purified,  and  blunted 
(reaction  conditions:  1  x  NEB  Blunting  Buffer,  100  dNTPs,  1 

of  Blunting  Enzyme  Mix  [New  England  Biolabs  E1201S]  in  a  25-p.L 
volume)  for  30  min  at  21°C.  The  product  was  purified  by  phenol/ 
chloroform  extraction  and  ethanol  precipitation  prior  to  A-tailing 
(reaction  conditions:  5  units  of  Klenow  Fragment  [3' — >5'  exo“] 
[New  England  Biolabs],  lx  reaction  buffer,  200  ^M  dATP  in 
a  50-jjiL  volume,  30  min  at  37°C).  The  product  was  purified  by 
phenol/chloroform  extraction  and  ethanol  precipitation  in  prep¬ 
aration  for  ligation  to  Iliumina  TS  adapters  (www.illumina.com) 
(reaction  conditions:  lx  NEB  T4  DNA  ligase  buffer,  1200  units  of 
T4  DNA  ligase  [New  England  Biolabs],  1  (xL  of  TS  adapters  in  a  30- 
|jlL  volume,  15  min  at  21°C).  The  product  was  purified  using 
a  QiaQuick  column  (Qiagen)  and  eluted  in  a  volume  of  30  jjiL.  Ten 
milliliters  was  then  amplified  by  PCR  using  Iliumina  primers  1.1 
and  2.2  (reaction  conditions:  1  unit  of  enzyme,  2x  Pfx  amplifica¬ 
tion  buffer,  1  mM  MgS04,  0.3  (jlM  oligonucleotides,  and  0.3  mM 
each  dNTP  in  a  25-(jlL  volume.  Two  minute  denaturation  at  94°C 
was  followed  by  15  cycles  of  30  sec  at  94°C,  30  sec  at  65°C,  and  30 
sec  at  68°C,  plus  a  final  extension  of  5  min  at  68°C.).  The  PCR 
product  was  purified  using  a  MinElute  column  (Qiagen)  with  a  fi¬ 
nal  volume  of  13  (xL. 

Iliumina  sequencing  and  analysis 

We  generated  18.8  million  36-nt  reads  and  21.7  million  50-nt 
single  ends  reads  with  the  Iliumina  GAII  analyzer,  using  se¬ 
quencing  chemistry  version  2  (Iliumina  FC-204-2036)  and  cleav¬ 
age  reagent  version  2  (Iliumina  #1005159).  The  combined  set  of 
40,582,229  short  reads  were  assembled  using  iSSAKE  (with  param¬ 
eters:  -m  15  -o  2  — r  0.7),  as  previously  described  (Warren  et  al. 
2009).  Briefly,  reads  aligning  to  the  end  of  Ensembl  TRBV  gene 
predictions  (Flicek  et  al.  2008)  and  having  consecutively  three  or 
more  unmatched  bases  In  the  adjacent  CDR3p  were  used  to  seed 
iSSAKE  de  novo  assemblies  of  CDR3p.  Only  contigs  with  a  depth  of 
at  least  two  reads  at  each  position  were  retained  for  analysis.  How¬ 
ever,  we  did  not  set  a  requirement  of  double  coverage  for  the  seed 
sequences  themselves.  Given  that  a  seed  sequence  could  be  long 
enough  to  span  the  complete  CDR3p,  and  there  is  no  requirement 
for  redundant  coverage  of  seed  sequences,  21,973  of  the  33,664 
clonotypes  in  our  data  set  are  represented  by  a  single  sequence  read. 
This  could,  in  principle,  artifactually  inflate  the  diversity  of  the 
repertoire,  but  in  reality  there  is  probably  very  little  influence 
given  that  previous  benchmarking  of  our  assembly  method  using 
simulated,  error-prone  short  sequence  reads  from  1  million  com¬ 


putationally  modeled  TCRp  sequences  showed  93%  sensitivity 
and  99.96%  accuracy  for  CDR3p  clonotypes  present  at  >3  p.p.m. 
(Warren  et  al.  2009). 

The  iSSAKE  contigs  (TCRp  reconstructions)  were  searched  for 
the  presence  of  15  consecutive  TRBJ  segment  bases.  For  any  TRBJ 
segment,  any  15-letter  word  from  base  position  1  to  25  charac¬ 
terizes  uniquely  that  segment  and  allows  the  identification  of  the 
precise  TRBJ  segment  boundary  as  well  as  the  number  of  TRBJ  bases 
deleted.  TRB 17  segments,  TRBV  segment  boundaries,  and  the  exact 
number  of  deleted  TRBV  bases  were  inferred  by  tracing  back  the 
seed  alignments  that  yielded  the  contigs  and  singlets  under  scru¬ 
tiny.  Sequence  clonotypes  were  identified  by  extracting  the  con¬ 
tiguous  bases  spanning  the  last  15  bases  of  TRBV  to  the  first 
recognizable  15  TRBJ  segment  bases,  inclusively.  During  this  au¬ 
tomated  process,  we  tracked  clonotypes  originating  from  seeds 
that  aligned  equally  well  to  more  than  one  TRBV  segment,  checked 
the  sequence  frame  and  peptide  translation  of  the  TCRp  recon¬ 
structions,  and  extracted  the  CDR3p,  if  applicable.  The  mined  data 
was  written  to  file  and  organized  into  a  MySQL  relational  database 
for  further  analysis. 

Fine-resolution  analysis  of  N-diversity  mechanisms  at  the 
V-D-J  junction  was  made  possible  by  searching  for  TRBD1  (12  nt) 
and  TRBD2  (16  nt)  bases  between  the  TRBV  and  TRBJ  boundaries 
using  the  longest  to  shortest  TRBD  word  sizes.  To  favor  accuracy 
over  yield  (some  bases  deletion/addition  yield  no  recognizable 
TRBD  bases),  we  chose  to  search  until  a  minimum  word  of  8  nt  for 
both  TRBD  segments.  Unambiguous  detection  of  TRBD  bases 
allows  precise  identification  of  TRBD  segment  boundaries,  the 
characterization  of  nontemplated  bases  at  the  V-D  and  D-J  junc¬ 
tion,  and  the  frequency  calculation  of  TRBD  deleted  bases  for  all 
clonotypes. 

Sanger  sequencing  and  analysis 

Purified  fragment  was  inserted  into  the  vector  pCR-4  using  the 
recommended  conditions  for  Invitrogen's  Zero  Blunt  TOPO  PCR 
Cloning  Kit  for  Sequencing  and  One  Shot  MAX  Efficiency  DH5a- 
T1R  competent  cells.  M13FP  and  M13RP  were  used  to  prime  Sanger 
Sequencing  reactions.  We  generated  paired  reads  from  384  clones. 

Low-quality  bases  were  trimmed  and  vector  sequences 
screened  using  Cross_match  (www.phrap.org).  A  total  of  736  quality 
and  vector-trimmed  paired-end  reads  remained  and  were  assembled 
using  CAP3  (Huang  and  Madan  1999).  Five  hundred  eighty-two 
reads  (331  clones)  contained  the  complete  V(D)J  sequence  and 
collapsed  into  220  unique  contigs  (including  single-read  contigs,  or 
singlets).  The  resulting  contigs  and  singlets  were  analyzed  for  the 
frequency  of  predicted  Vp  and  14  Jp  gene  segments.  Briefly,  we 
aligned  contigs  and  singlets  against  two  separate  databases  of 
Ensembl  (Flicek  et  al.  2008)  human  gene  predictions  for  54  Vp  and 
14  Jp  gene  segments  using  WU-BLAST  (default  parameters  with 
-b  3000  -v  3000).  For  each  of  the  database  sequences  we  tallied  the 
Vp  and/or  Jp  gene  alignments  having  the  highest  sequence  identity 
to  the  contig  and  singlet  sequences.  Sequence  alignments  were 
analyzed  using  custom  scripts,  noting  both  the  exact  position  of 
each  3'-Vp  and/or  5'-Jp  segment  onto  the  mRNA,  and  a  report  of  Vp 
frequency  and  Jp  frequency  was  generated.  Mapping  of  both  the 
Vp  and  Jp  segment  positions  and  ensuring  that  the  translation  frame 
was  preserved  and  consistent  with  peptide  predictions  permitted 
the  extraction  of  variable  CDR3p  bases  between  the  two  segments 
(not  shown). 
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24  Abstract 

25 

26  Background:  Host  T-cell  responses  are  associated  with  favorable  outcomes  in 

27  epithelial  ovarian  cancer  (EOC),  but  it  remains  unclear  how  best  to  promote  these 

28  responses  in  patients.  Toward  this  goal,  we  evaluated  a  panel  of  clinically  relevant 

29  cytokines  for  the  ability  to  enhance  multiple  T-cell  effector  functions  (polyfunctionality) 

30  in  the  native  tumor  environment.  Methodology/Principle  Findings:  Experiments  were 

3 1  performed  with  resident  CD8+  and  CD4+  T  cells  in  bulk  ascites  cell  preparations  from 

32  high-grade  serous  EOC  patients.  T  cells  were  stimulated  with  a-CD3  in  the  presence  of 

33  1 00%  autologous  ascites  fluid  with  or  without  exogenous  IL-2,  IL- 12,  IL- 1 8  or  IL-2 1 , 

34  alone  or  in  combination.  T-cell  proliferation  (Ki-67)  and  function  (IFN-y,  TNF-a,  IL-2, 

35  CCL4,  and  CD107a  expression)  were  assessed  by  multi-parameter  flow  cytometry.  We 

36  found  that  ascites  fluid  had  variable  effects  on  CD8+  and  CD4+  T-cell  proliferation,  but 

37  inhibited  T-cell  function  in  most  patient  samples,  with  CD  107a,  IFN-y,  and  CCL4 

38  showing  the  greatest  inhibition.  T-cell  proliferation  was  enhanced  by  exogenous  IL-2,  but 

39  other  T-cell  functions  were  largely  unaffected  by  single  cytokines.  The  combination  of 

40  IL-2  with  cytokines  engaging  complementary  signaling  pathways,  in  particular  IL-12  and 

41  IL-18,  enhanced  expression  of  IFN-y,  TNF-a,  and  CCL4  in  all  patient  samples  by 

42  promoting  polyfunctional  T-cell  responses.  However,  no  combination  of  cytokines 

43  enhanced  expression  of  CD  107a  or  IL-2.  Conclusions/Significance:  The  EOC  ascites 

44  environment  disrupts  multiple  T-cell  functions,  and  exogenous  cytokines  engaging 

45  diverse  signaling  pathways  only  partially  reverse  these  effects.  Our  results  may  explain 

46  the  limited  efficacy  of  cytokine  therapies  for  EOC  to  date.  Full  restoration  of  T-cell 
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47  function  will  require  activation  of  signaling  pathways  beyond  those  engaged  by  IL-2,  IL- 

48  12,  IL- 18  and  IL-2 1. 

49 

50  Key  Words:  T-cell  therapy,  immunotherapy,  cytokines,  polyfunctional,  ovarian  cancer 

51 
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52  Introduction 

53 

54  Numerous  studies  in  the  past  decade  have  shown  that  the  immune  system 

55  influences  clinical  outcomes  in  patients  with  high-grade  serous  epithelial  ovarian  cancer 

56  (hereafter  referred  to  as  EOC).  Specifically,  the  presence  of  tumor-infiltrating  CD8+  T 

57  cells  is  associated  with  prolonged  disease-free  and  overall  survival  [1,2].  Moreover, 

58  elevated  numbers  of  CD56+  T  cells  in  the  ascites  compartment  are  correlated  with 

59  increased  platinum  sensitivity  [3].  Together,  these  studies  suggest  that  enhancing  the 

60  endogenous  T-cell  response  to  EOC  may  be  of  clinical  benefit  to  patients.  However,  the 

6 1  ovarian  tumor  environment  contains  many  immunosuppressive  cell  types  and  factors  that 

62  can  oppose  anti-tumor  T-cell  responses.  Accordingly,  elevated  numbers  of  tumor- 

63  associated  regulatory  T  cells  (Tregs)  and  macrophages  are  associated  with  poor  survival 

64  in  EOC  [1,2].  Many  soluble  immunosuppressive  factors  are  also  found  in  ascites, 

65  including  IL-10,  TGF-p,  VEGF,  B7-H1/PD-L1,  B7-H4,  SDF-1,  EB AG9/RCAS 1 ,  Fas 

66  ligand,  and  soluble  IL-2  receptor  [1,2]. 

67  The  heterogeneity  of  immunosuppressive  mechanisms  in  EOC  presents  a 

68  formidable  challenge  for  immunotherapy,  as  it  suggests  that  multiple  immunosuppressive 

69  mechanisms  might  need  to  be  reversed  to  restore  T-cell  function.  For  example,  Treg 

70  depletion  is  being  evaluated  as  a  means  to  enhance  immunity  against  several  human 

71  cancers  [4],  including  EOC  [5],  but  at  best  would  only  reverse  one  mechanism  of 

72  immunosuppression,  leaving  other  immunological  barriers  intact.  Rather  than  attempt  to 

73  disable  these  barriers  one  by  one,  a  more  pragmatic  clinical  approach  would  be  to  deliver 

74  factors  that  can  broadly  override  immunosuppressive  barriers  in  the  tumor  environment. 
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75  Thus,  identifying  the  signals  or  conditions  that  promote  T-cell  responses  in  the  EOC 

76  environment  may  lead  to  more  effective  imrnuno therapeutic  strategies. 

77  To  this  end,  we  previously  developed  a  mouse  model  of  EOC  that  allows 

78  functional  assessment  of  CD8+  T  cell  responses  in  the  ovarian  tumor  environment  [6]. 

79  Specifically,  we  engineered  a  murine  EOC  tumor  line  to  express  a  CD8+  T  cell  epitope 

80  from  the  model  antigen  ovalbumin.  Mice  bearing  advanced,  widely  disseminated  tumors 

8 1  with  extensive  ascites  were  treated  by  adoptive  transfer  of  CD8+  T  cells  specific  for  the 

82  ovalbumin  epitope.  Remarkably,  adoptively  transferred  CD8+  T  cells  underwent 

83  vigorous  proliferation  not  only  in  lymph  nodes  and  peripheral  blood,  but  also  in  the 

84  ascites  and  tumor  compartments.  Indeed,  at  the  peak  of  the  response,  the  transferred  T 

85  cells  constituted  up  to  40%  of  CD8+  T  cells  in  peripheral  blood,  and  up  to  96%  of  CD8+ 

86  T  cells  in  ascites.  This  profound  proliferative  response  was  followed  by  rapid  and  near- 

87  complete  tumor  regression  in  the  majority  of  animals,  demonstrating  that  the  T  cells  were 

88  functionally  active.  Importantly,  the  proliferation  and  anti-tumor  activity  of  the 

89  transferred  CD8+  T  cells  was  entirely  dependent  on  IL-2/IL-15  signaling,  as 

90  demonstrated  by  genetic  disruption  of  the  IL-2  receptor  alpha  (CD25)  or  beta  (CD  122) 

91  subunits  [6],  Thus,  even  in  the  setting  of  advanced  ovarian  tumors,  CD8+  T  cells 

92  mounted  potent  anti-tumor  responses  provided  that  IL-2/IL-15  signaling  pathways  were 

93  intact. 

94  The  foregoing  results  raised  the  issue  of  whether  cytokines  such  as  IL-2  can 

95  similarly  over-ride  the  immunosuppressive  effects  of  ascites  in  human  EOC.  Previous  in 

96  vitro  studies  with  human  EOC  samples  have  shown  that  IL-2  can  partially  restore 

97  lymphokine-activated  killer  (LAK)  cell  cytotoxicity  in  the  presence  of  50%  ascites  fluid, 
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98  while  the  combination  of  IL-2  with  TCR  stimulation  [7]  or  IL-12  [8,9]  fully  restored 

99  LAK  cytotoxicity.  However,  use  of  50%  ascites  fluid  alone  does  not  fully  recapitulate  the 

100  ovarian  tumor  environment,  which  also  contains  immunosuppressive  cell  types  such  as 

101  Tregs  and  MDSCs  [1,2].  Furthermore,  LAKs  are  predominately  comprised  of  NK  cells, 

102  which  may  be  affected  differently  by  ascites  compared  to  T  cells.  These  past  studies  also 

103  measured  cytotoxicity  by  bulk  LAK  cell  preparations  and  hence  did  not  elucidate  the 

104  effects  of  cytokines  on  different  T  cell  subsets.  Finally,  in  vitro  cytotoxicity  is  only  one 

105  aspect  of  T-cell  function  and  does  not  always  correlate  with  effective  T-cell  responses  in 

106  vivo  [10]. 

107  There  is  growing  appreciation  of  the  importance  of  polyfunctional  T-cells,  i.e.,  T 

108  cells  that  can  simultaneously  perform  multiple  functions,  in  effective  immune  responses 

109  [11].  Specifically,  polyfunctional  T-cell  responses  are  associated  with  protective 

1 10  immunity  after  vaccination  against  smallpox  (vaccinia  virus)  [12],  yellow  fever  [13], 

1 1 1  tuberculosis  [14,15],  and  leishmaniasis  [16].  Elevated  numbers  of  polyfunctional  T  cells 

112  are  also  correlated  with  favorable  outcomes  in  a  variety  of  disease  settings,  including 

1 13  HIV/AIDS  [17,18,19,20,21,22,23,24,25,26,27],  hepatitis  C  [28]  and  lymphocytic 

114  choriomeningitis  [29].  In  the  setting  of  cancer,  several  studies  have  found  enhanced 

115  numbers  of  tumor-specific  polyfunctional  T  cells  in  patients  responding  favorably  to 

116  various  forms  of  immunotherapy,  including  adoptive  T-cell  therapy,  a-CTLA  antibody 

1 17  therapy,  and  a  multi-peptide  +  DNA  vaccine  [30,3 1,32,33,34].  However,  little  is  known 

118  about  the  effects  of  the  EOC  environment  on  polyfunctional  T-cell  responses. 

119  In  the  present  study,  we  used  multiparameter  flow  cytometry  to  assess  the  effects 

120  of  the  human  EOC  ascites  environment  on  multiple  T-cell  functions,  including  expression 
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121  of  IFN-y,  TNF-a,  IL-2,  CCL4,  and  surface  localization  of  CD107a  (a  marker  of 

122  degranulation).  Furthermore,  we  assessed  the  ability  of  IL-2  and  three  other  clinically 

123  relevant  cytokines  (IL-12,  IL-18  and  IL-21)  to  restore  T-cell  function  in  the  ascites  tumor 

124  environment.  We  found  that  (a)  ascites  inhibited  some  T-cell  functions  (CD107a,  CCL4, 

125  and  IFN-y)  more  strongly  than  others  (proliferation,  TNF-a,  and  IL-2);  (b)  exogenous  IL- 

126  2  generally  enhanced  T-cell  proliferation  but  had  little  effect  on  other  T-cell  functions; 

127  and  (c)  the  combination  of  IL-2  with  cytokines  engaging  complementary  signaling 

128  pathways  (in  particular  IL-12  and  IL- 1 8)  enhanced  proliferation  and  expression  of  IFN-y, 

129  CCL4  and  TNF-a  but  failed  to  enhance  CD107a  or  IL-2.  Thus,  T-cell  function  in  the 

130  human  EOC  environment  can  be  partially  restored  by  exogenous  cytokines.  However, 

131  full  restoration  of  T-cell  function  will  require  activation  of  signaling  pathways  beyond 

132  those  engaged  by  IL-2,  IL-12,  IL-18  and  IL-21. 

133 

134 

135 

136 

137 

138 

139 

140 

141 

142 


143 
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144  Results 

145 

146  EOC  ascites  disrupts  multiple  T-cell  functions 

147  Primary  ascites  specimens  were  collected  from  patients  with  high-grade  serous 

148  EOC  (Table  1).  To  model  the  native  ascites  tumor  environment  as  best  as  possible  in 

149  vitro,  we  analyzed  bulk  ascites  cell  pellets,  which  in  addition  to  CD8+  and  CD4+  T  cells, 

150  contained  variable  numbers  of  tumor  cells,  Tregs,  B  cells,  NK  cells,  and  CD  14+  cells 

151  (Table  2).  Moreover,  cells  were  cultured  in  100%  autologous  ascites  fluid  to  include  any 

152  soluble  factors,  such  as  TGF-p  (Table  2),  that  may  impact  T-cell  function.  Because  of  the 

153  paucity  of  well-defined  T-cell  antigens  in  ovarian  cancer,  we  stimulated  cultures  with  a- 

154  CD3  antibody,  which  should  stimulate  both  effector  and  regulatory  subsets.  Thus,  each 

155  patient’s  T  cells  were  analyzed  in  the  presence  of  the  full  complement  of  naturally 

156  occurring  tumor  cells,  immunosuppressive  cells,  and  soluble  factors. 

157  We  recently  demonstrated  in  a  murine  model  of  EOC  that  the  ascites  environment 

158  can  be  highly  pennissive  to  T-cell  proliferation  [6].  To  see  if  this  was  also  true  in  human 

159  EOC  samples,  we  tested  the  effect  of  ascites  fluid  on  T-cell  proliferation.  As  seen  in  Fig. 

160  1  A,  ascites  fluid  had  highly  variable  effects  on  CD8+  T-cell  proliferation  (as  measured  by 

161  Ki-67  expression),  ranging  from  strong  inhibition  (1/5  patients)  to  enhancement  of 

162  proliferation  (2/5  patients).  In  general,  similar  trends  were  seen  for  CD4+  T  cells  (Fig. 

163  1A,  and  data  not  shown).  Thus,  as  suggested  by  our  murine  studies  [6],  the  human  EOC 

164  ascites  environment  is  not  universally  suppressive  and  in  some  cases  can  enhance  T-cell 

165  proliferation. 
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166  We  next  detennined  whether  other  T-cell  functions  were  similarly  affected  by 

167  ovarian  ascites.  Flow  cytometric  analysis  was  used  to  assess  five  commonly  used  markers 

168  of  T-cell  function:  IFN-y,  TNF-a,  IL-2,  CCL4,  andCD107a  [11,12,17,18,19].  The  first 

169  four  markers  are  cytokines  involved  in  T-cell  proliferation  or  effector  function,  whereas 

170  CD  107a  is  expressed  on  the  surface  of  T  cells  undergoing  cellular  degranulation  and  thus 

171  serves  as  a  marker  of  cytotoxicity  [11].  In  all  patient  samples  tested,  ascites  fluid  caused  a 

172  dramatic  inhibition  of  CD  107a  expression  by  CD8+  T  cells,  suggesting  that  the  ascites 

173  environment  generally  inhibits  T-cell  degranulation  (Fig.  1A).  CCL4  and  IFN-y 

174  expression  were  also  inhibited  by  ascites  in  the  majority  of  patient  samples  (Fig.  1  A).  T 

175  cells  expressing  IL-2  and  TNF-a  were  relatively  rare,  and  were  minimally  affected  by 

176  ascites  fluid  in  most  cases  (Fig.  1  A).  Similar  results  were  seen  with  CD4+  T  cells  (data 

177  not  shown).  Intriguingly,  the  effects  of  ascites  on  T-cell  proliferation  and  other  T-cell 

178  functions  were  largely  uncoupled.  For  example,  with  patient  samples  IROC008  and 

179  IROC036,  ascites  fluid  enhanced  both  CD8+  and  CD4+  T-cell  proliferation  but  inhibited 

180  expression  of  CD107a,  CCL4,  and  IFN-y  (Fig.  1A,  and  data  not  shown).  Thus,  ascites 

181  fluid  has  widely  variable  effects  on  T-cell  functions  between  patient  samples,  in  accord 

182  with  the  heterogenous  nature  of  EOC. 

183  Although  a  systematic  evaluation  of  the  myriad  immunosuppressive  factors  in 

184  EOC  [1,2]  was  beyond  the  scope  of  this  study,  we  did  investigate  the  possible  influence 

185  of  Tregs  and  TGF-p.  The  percentage  of  Tregs  ranged  from  2.5%  to  6.8%  of  total  cells  but 

1 86  showed  no  correlation  with  the  degree  of  inhibition  of  T-cell  proliferation  or  other 

187  functions  (Table  2;  Pearson  correlation,  p  >  0.05  for  all  comparisons).  Likewise,  TGF-p 

188  levels  did  not  correlate  with  the  extent  of  T-cell  inhibition  (Table  2;  Pearson  correlation  p 
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189  >  0.05  for  all  comparisons).  These  findings  underscore  the  complex  immunobiology  of 

190  the  ovarian  tumor  environment  in  that  the  degree  of  immunosuppression  did  not  correlate 

191  with  either  one  of  these  common  immunosuppressive  factors. 

192 

193  Exogenous  cytokines  only  partially  restore  T-cell  function  in  the  EOC  ascites 

194  environment 

195 

196  In  our  mouse  model  of  EOC,  IL-2/IL-15  signaling  was  critical  for  CD8+  T-cell 

197  proliferation  and  function  in  ascites  [6].  However,  the  preceding  experiments  revealed 

198  that  the  vast  majority  of  T  cells  in  EOC  ascites  did  not  express  detectable  levels  of  IL-2. 

199  This  led  us  to  speculate  that  the  addition  of  IL-2  to  cultures  might  restore  T-cell 

200  proliferation  and  function.  To  test  this,  bulk  ascites  cells  were  stimulated  with  a-CD3  in 

201  the  presence  of  100%  ascites  fluid  with  or  without  exogenous  IL-2,  and  T-cell 

202  proliferation  and  function  were  measured  48h  later.  In  4/5  patient  samples,  IL-2  fully 

203  reversed  the  effects  of  ascites  on  T-cell  proliferation  and  indeed  induced  responses 

204  greater  than  those  seen  in  complete  media  (Lig.  IB).  By  contrast,  IL-2  had  little  effect  on 

205  the  expression  of  CD  107a,  CCL4,  ILN-y,  or  IL-2  for  most  patient  samples,  although  it 

206  caused  a  weak  but  statistically  significant  enhancement  of  TNL-a  expression  (Lig.  IB). 

207  Thus,  IL-2  alone  enhanced  T-cell  proliferation  but,  in  general,  had  negligible  effects  on 

208  other  functions. 

209  We  next  asked  if  the  engagement  of  other  signaling  pathways  might  better 

210  enhance  T-cell  function.  Whereas  IL-2  activates  primarily  the  STAT5  pathway,  the 

211  related  cytokine  IL-12  activates  the  STAT4  pathway,  which  has  been  shown  to  enhance  T 
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212  cell  function  in  various  physiological  settings  [36].  When  added  to  EOC  cultures,  IL-12 

213  alone  had  only  minor  effects  on  the  proliferation  or  function  of  CD8+  T  cells  (Fig.  IB)  in 

214  most  patient  samples.  We  next  evaluated  IL-2 1 ,  which  predominantly  activates  the 

215  STAT1  and  STAT3  pathways  [37].  Similar  to  IL-12,  IL-21  had  only  modest  effects  on  T- 

216  cell  proliferation  or  function  (Fig.  IB).  In  general,  similar  trends  were  seen  with  CD4+  T 

217  cells  (data  not  shown). 

218  Reasoning  that  a  broader  degree  of  functional  enhancement  might  be  achieved  by 

219  simultaneously  activating  multiple  STAT  pathways,  we  assessed  two  combinations  of 

220  cytokines:  a)  IL-2  +  IL-12,  and  b)  IL-2  +  IL-12  +  IL-21.  Importantly,  the  latter 

221  combination  would  theoretically  activate  all  STAT  pathways  relevant  to  CD8+  T-cell 

222  responses  (i.e.,  STAT1,  STAT3,  STAT4  and  STAT5).  In  general,  both  cytokine 

223  combinations  induced  T-cell  proliferation  similar  to  that  seen  with  IL-2  (compare  Fig.  2 

224  with  Fig.  IB).  Notably  however,  CD8+  T  cells  from  IROC028,  which  failed  to  proliferate 

225  in  response  to  any  single  cytokine  (Fig.  IB),  proliferated  in  response  to  both  cytokine 

226  combinations  (Fig.  2).  Moreover,  both  cytokine  combinations  were  modestly  more 

227  effective  than  single  cytokines  at  enhancing  IFN-y  and  TNF-a  production  (Fig.  2). 

228  Despite  this,  neither  cytokine  combination  enhanced  expression  of  CD  107a  or  IL-2  (Fig. 

229  2).  Similar  results  were  seen  with  CD4+  T  cells,  although  the  effects  of  cytokines  were 

230  generally  weaker  (data  not  shown).  In  summary,  the  addition  of  cytokine  combinations 

23 1  engaging  a  wide  range  of  STAT  signaling  pathways  failed  to  fully  restore  T-cell  function 

232  in  the  EOC  ascites  environment. 

233  Finally,  we  assessed  whether  activation  of  non-  STAT  signaling  pathways  could 

234  further  enhance  T-cell  function.  To  this  end,  we  evaluated  the  cytokine  IL-18,  which 
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235  belongs  to  the  IL-1  cytokine  family,  activates  the  MyD88/NFicb  pathway,  and  can 

236  directly  enhance  CD8+  T-cell  responses  in  a  variety  of  settings  [38,39,40].  IL-18  alone 

237  had  weak  effects  on  T-cell  proliferation  and  function  (Fig.  3A).  However,  when 

238  combined  with  IL-2  +  IL-12,  IL-18  significantly  enhanced  expression  of  IFN-y  and,  to  a 

239  lesser  extent,  TNF-a  (Fig.  3A).  Moreover,  the  combination  of  IL-2  +  IL-12  +  IL-18 

240  significantly  increased  the  amount  of  IFN-y  produced  on  a  per-cell  basis,  as  evidenced  by 

241  increased  mean  fluorescence  intensity  (MFI)  (Fig.  3B).  Nonetheless,  the  combination  of 

242  IL-2  +  IL-12  +  IL-18  was  unable  to  significantly  enhance  expression  of  CD  107a  or  IL-2 

243  (Fig.  3 A).  Similar  results  were  seen  for  CD4+  T  cells,  although  IL-18  did  not  increase  the 

244  MFI  of  IFN-y  (data  not  shown). 

245 

246  Exogenous  cytokines  induce  new  polyfunctional  T-cell  profiles 

247 

248  In  theory,  the  enhanced  activity  of  cytokine  combinations  compared  to  single 

249  cytokines  could  reflect  either  (a)  different  cytokines  stimulating  distinct  subpopulations 

250  of  T  cells,  or  (b)  cytokines  acting  synergistically  to  induce  single  T  cells  to  perform 

25 1  multiple  functions  (i.e.,  induction  of  polyfunctional  T  cells).  To  investigate  this  issue,  we 

252  used  Boolean  gate  analysis  to  study  the  different  functional  permutations  displayed  by 

253  CD8+  and  CD4+  T  cells  under  different  stimulation  conditions.  As  seen  in  the  top  panel 

254  of  Fig.  4,  when  bulk  ascites  cells  were  stimulated  in  complete  media,  only  five 

255  predominant  functional  permutations  were  seen  among  CD8+  T  cells:  (a)  non-functional 

256  (i.e.,  zero  function)  T  cells  (data  not  shown);  (b)  mono-functional  T  cells  expressing 

257  CCL4  or  CD  107a  (permutations  2  and  5),  (c)  bi-functional  T  cells  expressing  both 


13 


258  CD107a  and  CCL4  (permutation  13),  and  (d)  tri-functional  T  cells  expressing  CD107a, 

259  CCL4,  and  IFN-y  (permutation  24).  In  the  presence  of  100%  ascites  fluid,  the  number  of 

260  non-functional  T  cells  increased,  whereas  the  other  functional  permutations  were  reduced 

261  with  the  exception  of  mono-functional  T  cells  expressing  CCL4  (Fig.  4,  second  panel).  In 

262  general,  addition  of  single  cytokines  failed  to  significantly  change  the  functional 

263  pennutations  seen  with  ascites  alone,  although  IL-2  had  weak  effects  on  several 

264  pennutations  (Supplementary  Fig.  1).  In  contrast,  the  addition  of  cytokine  combinations 

265  (either  IL-2  +  IL-12  +  IL-21,  or  IL-2  +  IL-12  +  IL-18)  markedly  decreased  the  number  of 

266  zero-  and  mono  functional  T  cells  and  generated  three  new  poly  functional  permutations: 

267  a)  CCL4  and  IFN-y  (pennutation  10);  b)  CCL4,  IFN-y,  and  TNF-a  (permutation  17);  and 

268  c)  CCL4,  IFN-y,  TNF-a,  and  CD107a  (pennutation  28)  (Fig.  4,  third  and  fourth  panel). 

269  Similar  trends  were  seen  for  CD4+  T  cells,  although  the  effects  were  generally  weaker 

270  (data  not  shown).  Thus,  these  combinations  of  cytokines  appear  to  act  synergistically  to 

271  induce  polyfunctional  responses  by  individual  T  cells  as  opposed  to  stimulating  multiple 

272  subpopulations  of  mono-functional  T  cells. 

273 

274 

275 

276 

277 

278 

279 


280 
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281  Discussion 

282 

283  We  show  here  that  the  ascites  environment  in  human  EOC  has  variable  effects  on 

284  CD8+  and  CD4+  T-cell  proliferation  and  indeed  can  even  enhance  proliferation  in  some 

285  cases.  By  contrast,  ascites  generally  inhibits  expression  of  IFN-y,  TNF-a,  CCL4,  and 

286  CD  107a  by  T  cells,  resulting  in  a  preponderance  of  zero-  or  mono  functional  T  cells. 

287  Although  IL-2  promoted  T-cell  proliferation,  it  was  largely  ineffective  at  enhancing  other 

288  T-cell  functions,  as  were  other  single  agent  cytokines  (IL-12,  IL-18  and  IL-21).  In 

289  contrast,  combinations  of  cytokines  that  activate  complementary  signaling  pathways  (in 

290  particular  IL-2  +  IL-12  +  IL-18)  were  able  to  enhance  expression  of  IFN-y  and,  to  a  lesser 

291  extent,  TNF-a,  and  CCL4.  Nevertheless,  expression  of  IL-2  and  CD  107a  were  unaffected 

292  by  any  combination  of  cytokines.  Thus,  exogenous  cytokines  can  partly  restore  T-cell 

293  function  in  the  human  EOC  environment;  however,  alternative  strategies  will  be  required 

294  to  fully  restore  T-cell  function. 

295  One  limitation  of  this  study  is  the  relatively  small  sample  size,  which  was 

296  necessary  given  the  number  of  cytokines  and  functional  read-outs  involved.  Despite  the 

297  small  sample  size,  there  was  a  notable  degree  of  similarity  between  the  overall  functional 

298  profiles  of  all  five  patient  samples  both  at  baseline  and  after  cytokine  stimulation.  This 

299  suggests  that  the  diverse  immunosuppressive  mechanisms  present  in  ovarian  cancer 

300  might  ultimately  converge  toward  a  common  T-cell  functional  state  governed  by  a 

301  conserved  set  of  regulatory  signals. 

302  The  inability  of  the  clinical  cytokines  and  cytokine  combinations  tested  here  to 

303  enhance  T-cell  degranulation  and  IL-2  production  in  the  EOC  ascites  environment  raises 
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304  important  considerations  for  the  cytokine  therapy  of  ovarian  cancer.  Given  that  granule 

305  exocytosis  is  a  major  mechanism  by  which  T  cells  kill  virus-infected  and  transformed 

306  cells  [41],  the  inability  of  T  cells  to  degranulate  is  expected  to  significantly  impair  anti- 

307  tumor  responses.  Intriguingly,  whereas  we  found  that  cytokines  were  ineffective  at 

308  reversing  ascites-mediated  inhibition  of  degranulation,  previous  studies  have  shown  that 

309  the  combination  of  IL-2  with  TCR  stimulation  [7]  or  IL-12  [8,9]  could  fully  restore  LAK 

310  cytotoxicity  in  the  presence  of  50%  ascites  fluid.  This  discrepancy  could  reflect 

3 1 1  differences  in  cell  culture  conditions,  cell  type  analyzed,  and/or  the  method  used  to  assess 

3 12  cytotoxic  status.  In  the  prior  studies  with  LAK  cells  [7,8,9],  perhaps  the  observed 

313  cytotoxic  activity  was  mediated  by  the  Fas/FasL  pathway  rather  than  cytotoxic  granule 

3 14  release.  The  inability  of  T  cells  to  produce  IL-2  is  also  expected  to  impair  the  anti-tumor 

315  response.  In  our  mouse  model  of  ovarian  cancer  [6],  tumor  regression  was  entirely 

316  dependent  on  IL-2/IL-15  signaling.  Likewise,  Gattinoni  and  colleagues  demonstrated  that 

317  effector  CD8+  T  cells  capable  of  potent  in  vitro  anti -tumor  cytotoxicity  and  IFN-y 

318  production,  but  not  IL-2  production,  had  limited  anti-tumor  activity  in  vivo,  whereas  T 

319  cells  that  had  low  in  vitro  cytotoxicity,  but  high  IL-2  production,  mediated  superior  anti- 

320  tumor  responses  [10],  As  CD8+  T  cells  generally  lose  the  ability  to  produce  IL-2  upon 

32 1  differentiation,  the  failure  of  cytokines  to  enhance  IL-2  production  suggests  that  most 

322  tumor-associated  T  cells  may  be  terminally  differentiated  effectors  [1 1].  If  so,  future 

323  efforts  should  be  directed  toward  isolating  and  expanding  less  differentiated  subsets  of  T 

324  cells  with  greater  pluripotency. 

325  Even  though  the  combination  of  IL-2  +  IL-12  +  IL-18  did  not  rescue  expression 

326  of  CD  107a  and  IL-2,  it  potently  enhanced  the  amount  of  IFN-y  produced  by  individual 


16 


327  CD8+  T  cells  (Fig.  3B).  IFN-y  plays  a  central  role  in  anti-tumor  immunity  by  inhibiting 

328  tumor  proliferation  and  angiogenesis,  and  upregulating  tumor  antigen  presentation. 

329  Endogenous  IFN-y  protects  against  tumor  development,  and  in  a  number  of  tumor 

330  models,  is  critical  for  anti-tumor  immunity  [42],  Moreover,  IFN-y  and  IFN-y  receptor 

33 1  expression,  as  well  as  several  genes  downstream  of  IFN-y,  are  associated  with  increased 

332  survival  in  human  EOC  [1].  How  might  IL-2,  IL-12  and  IL-18  cooperatively  enhance 

333  IFN-y  production?  First,  these  cytokines  reciprocally  upregulate  expression  of  their 

334  respective  receptors,  resulting  in  enhanced  sensitivity  of  T  cells  to  all  three  cytokines 

335  [43,44,45,46].  Second,  these  cytokines  can  mediate  synergistic  and  complementary 

336  signaling  events  [36,47,48],  For  example,  IL-2  and  IL-12  synergistically  activate  the  p38 

337  MAPK  pathway  to  enhance  IFN-y  expression  [49].  Likewise,  IL-12  and  IL-18  activate 

338  the  transcription  factors  STAT4  and  AP-1,  respectively,  which  can  synergistically 

339  enhance  IFN-y  promoter  activity  [50],  Thus,  the  coordinated  activation  of  both  STAT  and 

340  non-STAT  signaling  pathways  appears  important  for  maximal  expression  of  IFN-y  and 

341  other  effector  cytokines. 

342  The  combination  of  IL-2  +  IL-12  +  IL-18  also  induced  significant  changes  in 

343  polyfunctional  T-cell  permutations.  Although  the  clinical  significance  of  these 

344  polyfunctional  profiles  has  yet  to  be  determined  for  EOC,  insights  can  be  gained  from 

345  other  disease  settings.  Two  of  the  four  functional  permutations  seen  at  baseline 

346  (specifically,  monofimctional  CD107a,  and  bi-functional  CD107a  and  CCL4;  Fig.  4)  are 

347  also  seen  elevated  in  CD8+  T  cells  from  HIV  progressors  compared  to  non-progressors, 

348  suggesting  that  these  permutations  may  represent  functionally  exhausted  T  cells  [17,18]. 

349  Addition  of  IL-2  +  IL-12  +  IL-18  decreased  the  number  of  CD8+  T  cells  exhibiting  these 
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350  “exhausted”  functional  permutations.  Moreover,  it  caused  the  emergence  of  three  new 

351  polyfunctional  permutations:  a)  CCL4  and  IFN-y;  b)  CCL4,  IFN-y,  and  TNF-a;  and  c) 

352  CCL4,  IFN-y,  TNF-a,  and  CD107a  (Fig.  4).  Notably,  these  three  pennutations  were 

353  shown  to  be  induced  in  CD8+  T  cells  by  a  protective  vaccine  against  smallpox  [12]. 

354  Furthermore,  tetra-functional  (CCL4,  IFN-y,  TNF-a,  and  CD107a)  CD8+  T  cells  are 

355  found  at  significantly  higher  frequencies  in  HIV  long-term  non-progressors  compared  to 

356  progressors  [17,18],  Finally,  a  higher  proportion  of  tumor-reactive  T  cells  co-expressing 

357  CCL4,  IFN-y,  and  TNF-a  were  detected  in  melanoma  patients  with  favorable  clinical 

358  responses  to  a-CTLA-4  treatment  [32].  These  prior  reports  suggest  that  the 

359  polyfunctional  permutations  induced  by  IL-2  +  IL-12  +  IL-18  may  have  therapeutic 

360  benefit  in  EOC,  although  this  suggestion  needs  to  be  validated  in  vivo. 

361  Our  findings  shed  new  light  on  previous  clinical  trials  in  EOC  involving  IL-2  and 

362  IL-12.  Intraperitoneal  administration  of  IL-2  resulted  in  an  approximately  25  %  overall 

363  response  rate  in  two  studies  [5 1,52],  while  IL-2  in  combination  with  retinoic  acid  as  a 

364  maintenance  therapy  demonstrated  a  relatively  modest,  but  statistically  significant, 

365  prolongation  of  progression-free  and  overall  survival  in  a  phase  II  trial  [53,54].  By 

366  contrast,  IL-12  has  shown  little  therapeutic  efficacy  in  EOC  in  two  clinical  trials  [55,56]. 

367  The  limited  efficacy  of  IL-2  and  IL-12  as  monotherapies  is  consistent  with  our  results 

368  demonstrating  the  modest  functional  activity  of  these  cytokines  as  single  agents.  Our 

369  results  would  also  predict  that  IL-18  and  IL-21  may  have  limited  clinical  efficacy  when 

370  used  as  monotherapies.  Indeed,  although  IL-18  and  IL-21  have  not  been  evaluated  in 

371  EOC  patients,  these  cytokines  as  single  agents  showed  limited  clinical  benefit  in  the 

372  setting  of  metastatic  melanoma  [57,58,59,60,61,62].  When  combined,  IL-2,  IL-12,  and 
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373  IL-18  showed  the  most  promise  in  the  in  vitro  experiments  reported  here,  but 

374  unfortunately  these  cytokines  would  likely  have  significant  side  effects  if  given  in 

375  combination  to  patients  [45,63,64].  While  it  may  be  possible  to  reduce  toxicity  by 

376  optimizing  dose  and  route  of  delivery  (e.g.,  intraperitoneal  administration)  [45,65],  one  is 

377  still  left  with  the  fact  that  this  combination  does  not  fully  restore  T-cell  function,  as 

378  manifested  by  impaired  expression  of  CD107a  and  IL-2.  Thus,  our  findings  indicate  that 

379  alternative  therapeutic  strategies  involving  other  signaling  pathways  will  be  required  to 

380  unleash  the  full  potential  of  host  T-cell  responses  against  EOC. 

381 
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396  Materials  and  Methods 

397 

398  Study  subjects  and  specimens. 

399 

400  Ethics  Statement.  Newly  diagnosed  patients  with  high-grade  serous  EOC  gave  written 

401  informed  consent  under  protocols  approved  by  the  Research  Ethics  Board  of  the  BC 

402  Cancer  Agency  and  the  University  of  British  Columbia. 

403 

404  Tumor  tissue  and  ascites  were  obtained  during  primary  surgery  prior  to  any  other 

405  treatment.  Ascites  was  centrifuged  at  300  g,  and  supernatants  (ascites  fluid)  were  stored 

406  at  -80°C.  Ascites  cell  (AC)  pellets  containing  large  quantities  of  red  blood  cells  were 

407  treated  with  ACK  lysis  buffer.  AC  pellets  were  cryopreserved  in  liquid  nitrogen.  Upon 

408  thawing,  ascites  cells  were  rested  in  complete  RPMI  (RPMI  1640  with  10%  FBS,  25mM 

409  HEPES,  ImM  sodium  pyruvate,  2mM  L-glutamine,  and  50pm  p-mercaptoethanol)  for  4h 

410  at  37°C  prior  to  experiments. 

411 

412  Antibodies  and  cytokines.  Flow  cytometry  was  performed  with  the  following 

413  fluorochrome-conjugated  antibodies  (BD  Biosciences):  CD3  (FITC,  PECy5),  CD4 

414  (FITC,  PE),  CD8  (PECy5,  APC-H7),  CD14  (PE),  CD19  (PE),  CD25  (PECy5),  CD56 

4 1 5  (PE),  CD  1 07a  (PECy5),  Ki-67  (FITC),  CCL4  (Mip- 1  p,  PE),  IL-2  (APC),  TNF-a 

416  (PECy7),  and  IFN-y  (PE,  AlexaFluor  700).  Foxp3  (PE)  was  from  eBioscience. 

417  Recombinant  human  IL-12,  IL-21  (both  Peprotech),  and  IL-18  (R&D  Systems)  were  used 
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418  at  lOOng/ml.  IL-2  (Proleukin)  was  used  at  lOOU/ml.  TGF-(3  (latent  and  active)  in  ascites 

419  fluid  was  quantified  by  ELISA  (eBioscience). 

420 

42 1  Proliferation  assays.  Ascites  cells  (AC)  were  seeded  in  triplicate  in  96-well  flat  bottom 

422  plates  at  1  x  105  cells  per  well.  AC  were  resuspended  in  media  or  100%  ascites  fluid  and 

423  were  left  unstimulated  or  stimulated  with  plate  bound  a-CD3s  antibody  (clone  OKT3, 

424  eBioscience)  previously  coated  at  2.5pg/ml  for  2h  at  37°C,  in  the  presence  or  absence  of 

425  cytokines.  T-cell  proliferation  was  measured  by  detection  of  Ki-67  using  flow  cytometry. 

426  Cells  were  washed  with  FACS  buffer  (1%  FBS  in  PBS),  permeabilized  with  ice-cold 

427  100%  methanol  and  incubated  at  -20°C  for  at  least  lh.  Cells  were  then  washed  with 

428  FACS  buffer  and  triple-stained  with  pre-titered  antibodies  to  Ki-67,  CD4,  and  CD8  for  at 

429  least  30min  at  room  temperature  in  the  dark.  Cells  were  washed  and  analyzed  with  a 

430  FACSCalibur  flow  cytometer  (Becton  Dickinson).  Data  was  analyzed  using  FlowJo 

43 1  software  (Tree  Star  Inc.). 

432 

433  Assessment  of  T-cell  functional  markers.  Bulk  ascites  cells  were  washed  with  serum- 

434  free  RPMI,  resuspended  at  5  x  105  cells/ml  in  complete  RPMI  or  autologous  ascites  fluid, 

435  and  plated  in  48-well  plates  at  3  x  105  cells/well  with  or  without  plate  bound  a-CD3s. 

436  Cytokines  and  a-CD107a  antibody  were  added  to  appropriate  wells,  and  cells  were 

437  incubated  for  48h  at  37°C.  As  per  standard  protocol,  a-CD107a  antibody  was  added 

438  during  stimulation  because  CD107a,  which  is  associated  with  the  membranes  of  cytotoxic 

439  granules,  transiently  localizes  to  the  surface  of  T  cells  undergoing  cellular  degranulation 

440  [35],  Thus,  cell-surface  expression  of  CD107a  serves  as  a  surrogate  marker  of 
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441  cytotoxicity.  As  a  positive  control,  PMA  plus  ionomycin  (both  from  Sigma)  were  used  at 

442  40ng/ml  and  1.5pg/ml,  respectively.  GolgiStop  (BD  Bioscience,  1:1500  dilution)  and 

443  brefeldinA  (Sigma,  5pg/ml)  were  added  for  the  final  6h  of  culture.  Cells  were  harvested, 

444  labeled  with  a-CD4  and  CD8  antibodies,  fixed  and  penneabilized  with  Cytofix/Cytopenn 

445  buffer  (BD  Bioscience)  according  to  manufacturer’s  instructions,  washed  with 

446  Perm/Wash  buffer,  and  stored  in  FACS  buffer  overnight  at  4°C.  Cells  were  washed  again 

447  with  Perm/Wash  buffer,  and  intracellular  staining  was  perfonned  using  antibodies  against 

448  CCL4,  IL-2,  TNF-a,  and  IFN-y.  Cells  were  fixed  with  2%  formaldehyde  and  stored  in 

449  FACS  buffer  overnight  at  4°C.  Samples  were  analyzed  with  a  BD  Bioscience 

450  FACSVantage  DIVA  modified  with  the  Octagon  array.  SSC  Area  vs.  SSC  W  were  used 

45 1  to  gate  out  doublets.  Data  was  analyzed  with  FlowJo  software  (TreeStar)  and  exported  to 

452  PESTLE  vl.6.1  (Mario  Roederer,  NIH)  for  further  data  analysis. 

453 
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680  Figure  Legends 

681 

682  Fig.  1.  Impact  of  ascites  fluid  and  IL-2,  IL-12,  and  IL-21  on  CD8+  T-cell 

683  proliferation  and  function.  Bulk  ascites  cells  were  stimulated  with  plate  bound  a-CD3 

684  in  media  or  100%  autologous  ascites  fluid  for  48h.  (A)  Proliferation  and  function  of 

685  CD8+  T  cells  in  the  presence  of  media  or  autologous  ascites  fluid  (top  panel)  was 

686  assessed  by  measuring  expression  of  Ki-67,  CD107a,  CCL4,  IFN-y,  TNF-a,  and  IL-2  by 

687  flow  cytometry.  (B)  Effect  of  IL-2  (second  panel),  IL-12  (third  panel),  and  IL-21  (fourth 

688  panel)  on  CD8+  T-cell  proliferation  and  various  functions  in  the  presence  of  autologous 

689  ascites  fluid.  Data  has  been  normalized  to  a-CD3  stimulated  cells  in  media  (i.e.,  media 

690  values  were  subtracted  from  each  stimulation  condition).  *  There  was  a  significant  effect 

691  of  the  cytokine  for  enhancing  the  indicated  function  compared  to  cells  stimulated  in 

692  media  (Wilcoxon  matched  pairs  t  test,  p  <  0.05). 

693 

694  Fig.  2.  Effects  of  cytokine  combinations  on  CD8+  T-cell  proliferation  and  function. 

695  Bulk  ascites  cells  were  stimulated  with  plate  bound  a-CD3  in  media  or  100%  autologous 

696  ascites  fluid  for  48h.  Proliferation  and  function  of  CD8+  T  cells  in  the  presence  of  media, 

697  autologous  ascites  fluid  (top  panel),  or  ascites  fluid  in  the  presence  of  IL-2  +  IL-12 

698  (middle  panel),  or  IL-2  +  IL-12  +  IL-21  (bottom  panel)  was  assessed  by  measuring 

699  expression  of  Ki-67,  CD107a,  CCL4,  IFN-y,  TNF-a,  and  IL-2  by  flow  cytometry.  Data 

700  has  been  nonnalized  to  a-CD3  stimulated  cells  in  media  (i.e.,  media  values  were 

701  subtracted  from  each  stimulation  condition).  "There  was  a  significant  effect  of  the 
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702  cytokine  combination  for  enhancing  the  indicated  function  compared  to  cells  stimulated 

703  in  media  (Wilcoxon  matched  pairs  t  test,  p  <  0.05). 

704 

705  Fig.  3.  Effects  of  IL-18  alone,  or  in  combination  with  IL-2  and  IL-12,  on  CD8+  T-cell 

706  proliferation  and  function.  Bulk  ascites  cells  were  stimulated  with  plate  bound  a-CD3 

707  in  media  or  100%  autologous  ascites  fluid  for  48h.  Proliferation  and  function  of  CD8+  T 

708  cells  in  the  presence  of  media  or  autologous  ascites  fluid  (top  panel)  was  assessed  by 

709  measuring  expression  of  Ki-67,  CD107a,  CCL4,  IFN-y,  TNF-a,  and  IL-2  by  flow 

710  cytometry.  (A)  Effect  of  IL-18  (middle  panel)  and  IL-2  +  IL-12  +  IL-18  (bottom  panel) 

711  on  CD8+  T-cell  proliferation  and  various  functions  in  the  presence  of  ascites  fluid.  Data 

712  has  been  nonnalized  to  a-CD3  stimulated  cells  in  media  (i.e.,  media  values  were 

713  subtracted  from  each  stimulation  condition).  *There  was  a  significant  effect  of  IL-2  +  IL- 

714  12  +  IL-18  for  enhancing  the  indicated  function  compared  to  cells  stimulated  in  media 

715  (Wilcoxon  matched  pairs  t  test,  p  <  0.05).  (B)  Mean  fluorescence  intensity  (MFI)  of  IFN- 

716  y  positive  CD8+  T  cells  was  determined  by  intracellular  IFN-y  staining.  *The  effect  of 

717  IL-2  +  IL-12  +  IL-18  was  significantly  greater  than  the  other  two  cytokine  combinations 

7 1 8  (Wilcoxon  matched  pairs  t  test,  p  <  0.05). 

719 

720  Fig.  4.  Effects  of  cytokine  combinations  on  polyfunctional  CD8+  T-cell  responses. 

72 1  Bulk  ascites  cells  were  stimulated  with  plate  bound  a-CD3  in  media  or  100%  autologous 

722  ascites  fluid  for  48h  in  the  presence  or  absence  of  the  indicated  cytokine  or  cytokine 

723  combination.  Boolean  gate  analysis  was  performed  to  quantify  the  number  of  T  cells 

724  expressing  each  of  3 1  possible  functional  pennutations.  Shown  are  the  results  for  CD8+ 
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725  T  cells  stimulated  in  media,  ascites  fluid,  or  ascites  fluid  supplemented  with  the  cytokine 

726  combinations  IL-2  +  IL-12  +  IL-21  or  IL-2  +  IL-12  +  IL-18.  The  frequency  of  T  cells 

727  expressing  a  given  permutation  is  expressed  as  a  percentage  of  total  CD8+  T  cells.  *For 

728  the  indicated  permutation,  the  effect  of  the  cytokine  combination  was  significantly 

729  greater  than  that  seen  with  media  (Wilcoxon  matched  pairs  t  test,  p  <  0.05). 

730 

731 

732 

733 

734 

735  Tables 

736 

737  Table  1.  Patient  clinical  characteristics 


Patient  ID 

Age  at 
diagnosis 

Pathologic 

diagnosis 

Grade 

FIGO 

Staging 

TNM  Staging 

IROC008 

70 

Papillary  serous 
carcinoma 

3/3 

4 

pT3c,  pNl 

IROC028 

61 

Papillary  serous 
carcinoma 

3/3 

3C 

pT3c,  NX,  MX 

IROC034 

64 

Papillary  serous 
carcinoma 

3/3 

N/A 

T3c 

IROC036 

60 

Papillary  serous 
carcinoma 

3/3 

N/A 

T3c,  NX,  MX 

IROC038 

40 

Papillary  serous 
carcinoma 

3/3 

3B 

pT3b 

738  Median  Age:  61;  Mean  Age:  59;  Age  Range:  40-70 

739  N/A,  not  assessed 
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741  Table  2.  Cellular  composition  and  TGF-P  levels  in  the  ascites  compartment  of  EOC 

742  patients  (data  is  expressed  as  a  percentage  of  total  live  cells  unless  otherwise  stated) 

743 


Parameter 

IROC  Patient  Sample 

008 

028a 

034 

036 

038 

Lymphocytes15 

28.5 

74.6 

78.0 

20.6 

40.2 

CD3+ 

19.8 

59.0 

58.0 

11.9 

21.9 

CD3+CD4+ 

6.5 

41.1 

26.8 

4.5 

6.2 

CD3+CD8+ 

12.4 

16.9 

28.1 

6.6 

13.0 

CD3+CD56+ 

3.8 

1.6 

10.1 

1.0 

3.2 

CD56+ 

2.7 

12.0 

9.7 

3.0 

4.6 

TregsL 

(CD4+CD25+FoxP3+) 

5.2 

6.8 

2.9 

3.3 

2.5 

CD25+FoxP3+  in  CD4+ 

12.4 

12.1 

12.1 

6.8 

10.7 

B  cellsL  (CD  19+) 

8.1 

1.4 

6.5 

4.5 

5.0 

CD  14+ 

55.3 

11.2 

1.6 

50.5 

30.9 

TGF-P  (pg/ml) 

6.3 

NDU 

33.3 

ND 

ND 

744  A  Sample  contained  many  large  tumor  rafts  not  detectable  by  FACS.  Therefore,  values 

745  are  inflated  since  visual  inspection  and  IHC  staining  shows  this  sample  is  comprised 

746  mainly  of  tumor  cells. 

747  B  As  detennined  by  side  and  forward  scatter. 

748  Tregs  and  B  cells  expressed  as  a  percentage  of  total  live  lymphocytes 

749  dND,  not  detectable 
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Editorial 

IDO  and  outcomes  in  ovarian  cancer 


Ovarian  cancer  affects  over  190,000  women  worldwide  each  year 
(International  Agency  for  Research  on  Cancer).  While  over  80%  of 
patients  are  highly  responsive  to  frontline  treatment  (cytoreductive 
surgery  followed  by  taxane-and  platinum-based  chemotherapy),  a 
large  majority  experience  disease  recurrence  within  2-5  years  and 
ultimately  succumb  to  their  disease.  Despite  these  unfortunate 
statistics,  20-30%  of  ovarian  cancer  patients  survive  5  years  or  more 
after  diagnosis.  Favorable  prognostic  factors  include  early  stage,  non- 
serous  histology,  low  grade,  good  performance  status,  and  optimal 
surgical  debulking. 

In  the  past  few  years,  it  has  become  apparent  that  the  host 
immune  system  too  has  a  strong  influence  on  survival  from  ovarian 
cancer.  Specifically,  the  presence  of  CD8+T  cells  in  tumor 
epithelium  has  been  associated  with  prolonged  disease-free  and 
overall  survival  in  numerous  studies  (reviewed  in  [1]).  Other 
features  of  CD8+T  cell  responses  are  also  associated  with  favorable 
prognosis  in  ovarian  cancer,  including  intratumoral  levels  of 
interferon-7  and  its  receptor,  IL-18,  TNF-a,  1RF-1,  MF1C  class  I 
molecules  and  antigen  processing  machinery,  and  the  cytolytic 
granule  component  TIA-1  [1],  Concordant  results  have  been 
reported  in  a  wide  variety  of  other  human  cancers,  leading  to  the 
general  view  that  host  T  cell  responses  can  profoundly  influence 
clinical  outcomes  in  human  cancer. 

In  this  issue  of  Gynecological  Oncology,  Inaba  and  colleagues  [2] 
investigate  another  facet  of  the  immune  response  to  ovarian  cancer. 
Using  a  cohort  of  60  ovarian  cancer  cases  representing  a  range  of 
histological  subtypes,  stage  and  grade,  they  assessed  expression  of  the 
enzyme  indoleamine-2, 3-dioxygenase  (IDO)  in  ovarian  tumors.  They 
report  that  IDO  expression  is  associated  with  high  grade,  significantly 
fewer  intraepithelial  CD8+T  cell  infiltrates,  and  decreased  overall  and 
progression-free  survival.  These  findings  are  reminiscent  of  prior 
work  by  this  group  in  endometrial  cancer  [3,4],  and  by  others  in 
serous  ovarian  cancer  [5,6],  colorectal  cancer  [7],  and  hepatocellular 
carcinoma  [8],  all  indicating  an  inverse  association  between  IDO 
expression  and  clinical  outcomes. 

The  association  between  IDO  expression  and  reduced  T  cell 
infiltrates  fits  with  work  by  this  group  and  others  showing  that  IDO 
can  have  immunosuppressive  effects  [9],  Activated  lymphocytes,  by 
releasing  IFN-7,  can  induce  IDO  expression  in  a  variety  of  tissues.  In 
turn,  IDO  can  inhibit  T  cell  proliferation  and  function  by  several 
mechanisms.  IDO  degrades  tryptophan,  yielding  breakdown  products 
called  kynurenines.  Thus,  expression  of  IDO  can  deplete  tryptophan 
locally,  leaving  T  cells  starved  for  this  amino  acid.  In  addition, 
kynurenines  can  directly  cause  T  cell  apoptosis.  Finally,  regulatory  T 
cells  can  induce  expression  of  IDO  on  dendritic  cells,  which  in  turn  can 
inhibit  the  activation  of  naive  T  cells  in  tumor  draining  lymph  nodes. 
The  relative  importance  of  these  different  mechanisms  to  immune 
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suppression  is  unclear,  and  likely  depends  on  physiological  context 
[9], 

The  influence  of  IDO  on  physiological  immune  responses  also 
appears  to  depend  on  context.  Several  studies  have  demonstrated  a 
clear  immune  suppressive  role.  Munn  et  al.  [10]  showed  that  IDO 
expression  at  the  feto-maternal  interface  was  crucial  to  prevent 
rejection  of  allogeneic  fetuses  in  mice.  In  murine  tumor  models,  forced 
expression  of  IDO  protects  tumors  from  T  cell-mediated  rejection.  And 
in  the  allogeneic  transplant  setting,  IDO-deficient  mice  fail  to  control 
lethal  CD8+T  cell  responses.  Despite  these  findings,  IDO—/—  mice 
have  intact  central  and  peripheral  tolerance  and  do  not  develop 
autoimmunity.  Moreover,  in  a  study  of  human  kidney  allografts,  IDO 
expression  was  seen  in  rejected  organs  only  [11],  indicating  that  IDO 
expression  is  not  invariably  associated  with  immune  suppression. 

In  the  present  paper,  Inaba  and  colleagues  provide  thought- 
provoking  data  about  the  role  of  IDO  in  the  immune  response  to 
ovarian  cancer.  Based  on  their  findings  in  human  ovarian  tumors,  they 
transfected  IDO  into  the  ovarian  cancer  line  SKOV3  and  assessed  the 
in  vitro  and  in  vivo  consequences  on  cell  behavior.  When  evaluated 
in  vitro,  IDO-expressing  cells  showed  normal  morphology,  prolifera¬ 
tion,  migration,  invasive  activity,  and  sensitivity  to  the  chemothera¬ 
peutic  agent  paclitaxel.  However,  when  an  IDO-expressing  clone  was 
injected  inraperitoneally  into  nude  mice,  there  was  a  marked  increase 
in  the  volume  and  extent  of  dissemination  of  IDO-expressing  tumors 
compared  to  the  parental  tumor  line.  This  effect  could  be  suppressed 
by  daily  administration  of  the  IDO  inhibitor  1 -methyl-tryptophan  (1- 
MT).  While  administration  of  1-MT  alone  did  not  lead  to  increased 
survival  of  mice,  1-MT  potentiated  the  therapeutic  effects  of 
paclitaxel,  as  reported  previously  by  this  group  in  a  model  of 
endometrial  cancer  [12]. 

While  the  therapeutic  effects  of  1-MT  and  paclitaxel  speak  for 
themselves,  the  immunological  basis  of  these  observations  is  unclear. 
Using  a  spontaneous  mammary  tumor  model,  others  have  shown  that 
1-MT  potentiates  the  effects  of  paclitaxel  by  a  T  cell-dependent 
mechanism  [13].  However,  the  present  study  used  nude  mice,  which 
lack  T  cells.  Nonetheless,  the  authors  note  that  nude  mice  contain  NK 
cells,  which  could  be  the  target  of  immune  suppression  by  IDO  and 
rescue  by  1-MT.  Indeed,  this  group  has  presented  evidence  that  IDO 
can  suppress  NI<  cell  responses  in  nude  mice  bearing  endometrial 
tumors  [12], 

Given  the  enhanced  therapeutic  effects  of  1-MT  in  combination 
with  paclitaxel,  should  IDO  inhibitors  be  considered  for  the  treatment 
of  ovarian  cancer?  The  strong  association  between  tumor-infiltrating 
T  cells  and  survival  suggests  that  any  intervention  that  enhances  T  cell 
immunity  is  likely  to  be  clinically  beneficial,  provided  side  effects  are 
low.  However,  several  issues  need  to  be  resolved  in  the  case  of  IDO 
inhibitors.  First,  there  is  controversy  over  the  relative  efficacy  of  the  D 
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and  L  isoforms  in  different  models,  an  issue  that  needs  clarification 
prior  to  human  trials  [9,14],  Second,  more  information  is  needed 
about  the  off-target  effects  of  IDO  inhibitors,  such  as  the  inhibition  of 
tryptophan  transporters,  which  can  even  affect  IDO-negative  cells. 
Third,  another  form  of  IDO  was  recently  discovered  (ID02),  and  more 
study  is  required  into  the  role  of  ID02  in  immune  suppression  and 
cancer  progression  [15],  Finally,  the  results  by  Inada  and  colleagues 
suggest  there  is  more  to  be  learned  about  the  immunological  effects  of 
IDO  inhibitors,  given  the  therapeutic  effects  they  observed  in  T  cell- 
deficient  hosts.  In  addition  to  the  NIC  cell  hypothesis  they  propose, 
perhaps  IDO  can  promote  tumor  growth  and  dissemination  by  non- 
immunological  mechanisms  as  well.  For  example,  by  depleting 
tryptophan,  IDO  could  potentially  induce  an  altered  metabolic  state 
in  tumor  cells  that  changes  their  growth  and  dissemination 
properties.  Thus,  the  present  paper  highlights  the  importance  of  IDO 
in  influencing  ovarian  cancer  outcomes,  and  underscores  the  need  to 
better  understand  the  underlying  mechanisms  to  allow  successful 
translation  of  these  findings  to  the  clinic. 
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Abstract 

Background:  Tumor-infiltrating  T  cells  are  associated  with  survival  in  epithelial  ovarian  cancer  (EOC),  but  their  functional 
status  is  poorly  understood,  especially  relative  to  the  different  risk  categories  and  histological  subtypes  of  EOC. 

Methodology/Principal  Findings:  Tissue  microarrays  containing  high-grade  serous,  endometrioid,  mucinous  and  clear  cell 
tumors  were  analyzed  immunohistochemically  for  the  presence  of  lymphocytes,  dendritic  cells,  neutrophils,  macrophages, 
MHC  class  I  and  II,  and  various  markers  of  activation  and  inflammation.  In  high-grade  serous  tumors  from  optimally 
debulked  patients,  positive  associations  were  seen  between  intraepithelial  cells  expressing  CD3,  CD4,  CD8,  CD45RO,  CD25, 
TIA-1,  Granzyme  B,  FoxP3,  CD20,  and  CD68,  as  well  as  expression  of  MHC  class  I  and  II  by  tumor  cells.  Disease-specific 
survival  was  positively  associated  with  the  markers  CD8,  CD3,  FoxP3,  TIA-1,  CD20,  MHC  class  I  and  class  II.  In  other 
histological  subtypes,  immune  infiltrates  were  less  prevalent,  and  the  only  markers  associated  with  survival  were  MHC  class 
II  (positive  association  in  endometrioid  cases)  and  myeloperoxidase  (negative  association  in  clear  cell  cases). 

Conclusions/Significance:  Host  immune  responses  to  EOC  vary  widely  according  to  histological  subtype  and  the  extent  of 
residual  disease.  TIA-1,  FoxP3  and  CD20  emerge  as  new  positive  prognostic  factors  in  high-grade  serous  EOC  from  optimally 
debulked  patients. 
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Introduction 

Ovarian  cancer  is  the  most  deadly  gynecologic  cancer,  affecting 
more  than  190,000  women  worldwide  each  year  (International 
Agency  for  Research  on  Cancer).  Delayed  diagnosis  and  the 
presence  of  widely  disseminated  disease  account  for  the  high 
mortality  associated  with  the  disease.  Additionally,  while  a  large 
percentage  of  patients  initially  respond  well  to  cytoreductive 
surgery  and  standard  chemotherapy,  the  disease  usually  recurs 
within  2-5  years  as  residual  tumor  cells  develop  resistance  to 
chemotherapy  [1,2].  Although  prognosis  is  often  poor,  numerous 
favorable  prognostic  indicators  have  been  described,  including 
early  stage,  low  grade  and  optimal  surgical  debulking  [3,4], 

Several  recent  studies  have  analyzed  the  influence  of  host 
immunity  on  disease  prognosis.  Tumor-infiltrating  CD3+  T  cells 
are  strongly  associated  with  favorable  prognosis,  specifically  when 
CD  3+  cells  are  localized  within  tumor  epithelium  [5-9].  These 
findings  have  been  extended  to  the  CD8+  T  cell  subset  in 
particular  [10-17],  suggesting  that  cytotoxic  T  lymphocytes  (CTLs) 


play  an  important  role  in  the  antitumor  immune  response. 
Accordingly,  other  factors  associated  with  CTL  responses  are  also 
positively  associated  with  survival,  including  interferon-y  (IFN-  y) 
[18,19],  the  IFN-  y  receptor  [20],  interferon  regulatory  factor 
(IRF)- 1  [21],  IL-18  [22],  TNF-a  [23],  MHC  class  I  [24-26],  and 
MHC  class  I  antigen  processing  machinery  [17]. 

In  contrast  to  CD8+  T  cells,  several  studies  have  indicated  that 
tumor-infiltrating  CD25+FoxP3+  T  cells  (referred  to  as  regulatory 
T  cells  or  Tregs)  are  associated  with  decreased  survival  [10,27-29]. 
Tregs  have  the  ability  to  suppress  proliferation,  cytokine 
production,  and  cytolytic  activity  of  CD4+  and  CD8+  T  cells  by 
mechanisms  involving  cell-to-cell  contact  and  the  release  of 
cytokines  such  as  TGF-P  [30,31].  Tregs  can  also  induce  an 
immunosuppressive  phenotype  in  other  cell  types  such  as 
macrophages  [32].  Although  Tregs  have  been  associated  with 
poor  prognosis  in  many  cancers,  several  exceptions  have  recently 
been  reported.  Leffers  et.  al.  found  that  FoxP3+  infiltrates  in 
advanced  stage  EOC  were  associated  with  increased  survival  [14], 
Similar  findings  have  been  reported  in  colorectal  cancer  [33]  and 
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lymphoma  [34-36].  Furthermore,  in  murine  models,  FoxP3+  cells 
can  play  a  positive  role  in  anti-tumor  and  anti-viral  immunity 
[37,38].  The  precise  role  of  regulatory  T  cells  in  cancer  outcomes 
warrants  further  consideration  given  that  several  groups  are 
attempting  to  enhance  tumor  immunity  by  depleting  FoxP3+ 
Tregs  from  cancer  patients  [39-44],  including  EOC  patients  [45]. 

In  addition  to  Tregs,  other  cell  types  reportedly  play  an 
immunosuppressive  role  in  EOC.  For  example,  plasmacytoid  dendritic 
cells  contribute  to  immunosuppression  by  promoting  the  development 
or  recruitment  of  interleukin- 10-producing  CD4+  and  CD8+ 
regulatory  T  cells  [46,47],  Myeloid  dendritic  cells  (MDCs)  impair  T 
cell  immunity  by  expressing  B7-H 1 ,  a  ligand  for  the  inhibitory  receptor 
PD- 1  found  on  T  cells  [48] .  Monocytes  and  macrophages  in  the  EOC 
microenvironment  can  be  polarized  toward  a  so-called  M2  phenotype, 
which  is  typified  by  the  expression  of  IL-10,  TGF-b  and  scavenger 
receptors  and  is  thought  to  promote  tumor  progression  [49,50,51]. 
Under  the  influence  of  IL-6  and  IL-10,  macrophages  in  EOC  can  also 
express  B7-H4,  which  inhibits  T  cell  proliferation  [52].  Macrophages 
also  produce  CCL22,  which  promotes  Treg  recruitment  to  the  tumor 
environment  [32].  Finally,  expression  of  the  inflammatory  mediator 
COX-2  in  tumor  epithelium  has  been  associated  with  reduced 
lymphocyte  infiltration  and  poor  prognosis  in  EOC  [13,53]. 

With  the  advent  of  tumor  tissue  microarray  (TMA)  technology, 
a  large  number  of  retrospective  studies  have  investigated  the 
relationship  between  tumor-infiltrating  immune  cells  and  progno¬ 
sis  in  EOC  and  other  cancers.  However,  most  studies  focus  on  one 
or  a  few  markers,  such  that  associations  between  different 
immunological  factors  may  be  missed.  Additionally  most  studies 
fail  to  address  the  different  histological  subtypes  of  EOC,  which 
are  now  recognized  to  behave  as  distinct  diseases  [54] .  As  a  result, 
there  are  inconsistencies  and  unresolved  issues  in  the  literature 
concerning  the  prognostic  significance  of  different  immune  cell 
infiltrates.  To  address  this,  we  analyzed  several  large  series  of  EOC 
tumors,  including  high-grade  serous,  endometrioid,  clear  cell  and 
mucinous  subtypes,  for  the  presence  of  various  immune  cell 
infiltrates  and  inflammatory  markers.  Our  results  reveal  that  high- 
grade  serous  tumors  have  a  distinct  immunological  profile  that  is 
strongly  associated  with  patient  survival. 

Results 

Intraepithelial  T  cells  and  associated  functional  markers 
in  high-grade  serous  EOC 

We  initially  investigated  the  relationship  between  immune 
infiltrates  and  survival  in  a  cohort  of  199  high-grade  serous  EOC 
patients.  We  chose  to  first  focus  on  high-grade  serous  cases,  as  the 
other  histological  subtypes  exhibit  distinct  biological  and  clinical 
properties  that  are  potentially  confounding  [54,55].  This  initial 
cohort  was  restricted  to  patients  who  had  undergone  optimal 
cytoreduction  (i.e.,  without  evidence  of  macroscopic  residual 
disease).  Patient  characteristics  are  shown  in  Table  1. 

The  tumors  in  this  initial  cohort  had  been  previously  assessed  by 
immunohistochemistry  (IHC)  for  a  variety  of  lymphocyte  markers, 
including  CD3,  CD4,  CD8,  CD20  and  Granzyme  B  [12]. 
Intraepithelial  lymphocytes  (i.e.,  lymphocytes  within  the  epithelial 
component  of  the  tumor)  were  scored  as  either  present  (i.e.  one  or 
more  intraepithelial  lymphocytes  present  in  at  least  one  of  two  0.6 
mm  cores)  or  absent.  We  re-analyzed  this  data  focusing  exclusively 
on  high-grade  serous  cases.  We  found  that  83.2%  (163/196)  of 
evaluable  high-grade  serous  tumors  were  positive  for  intraepithe¬ 
lial  CD3+  T  cells,  whereas  CD4+  and  CD8+  intraepithelial  cells 
were  found  in  53.4%  (103/193)  and  84.0%  (163/194)  of  evaluable 
tumors,  respectively  (Fig.  1A&B  and  data  not  shown).  CD4+  and 
CD8+  cellular  infiltrates  showed  a  strong  positive  association 


Table  1.  Clinical  characteristics  of  the  optimally  debulked 
high-grade  serous  EOC  patient  cohort. 


Age  at  surgery  (years) 

Mean 

61.00 

Std  dev 

11.48 

Range 

37.59-85.96 

Median 

60.08 

*  Overall  Survival  (years) 

Mean 

5.59 

Std  dev 

3.47 

Range 

0.4-1 7.4 

Median 

4.91 

Silverberg  Grade 

1  0 

2 

56 

3 

143 

Unknown 

0 

Stage 

1  49 

II 

85 

III 

65 

IV 

0 

Unknown 

0 

Total  number  of  evaluable  patients 

199 

*There  were  no  deaths  due  to  causes  other  than  ovarian  cancer,  therefore 
disease-specific  and  overall  survival  were  equivalent. 
doi:1 0.1 371/journal.pone.000641 2.t001 


(p<0.0001).  Table  2  shows  statistical  associations  for  these  and  all 
other  markers  studied  in  this  initial  cohort. 

While  the  above  markers  indicate  which  lymphocyte  subsets  are 
present  in  tumors,  they  do  not  reveal  their  activation  state.  To 
address  this  issue,  we  analyzed  tumors  for  expression  of  CD45RO, 
0X40  and  CD25,  which  are  expressed  by  activated  T  cells 
[56,57],  Using  the  same  scoring  criteria  as  above,  70.6%  (132/ 
187)  of  tumors  were  positive  for  intraepithelial  CD45RO+  cells, 
and  49.7%  (96/193)  were  positive  for  intraepithelial  CD25+  cells 
(Fig.  1C&E).  By  contrast,  only  7.0%  (11/158)  of  tumors  were 
positive  for  intraepithelial  0X40+  cells  (Fig.  ID).  In  pair-wise 
comparisons,  CD45RO,  CD25  and  0X40  were  all  positively 
associated  (Table  2).  Moreover,  CD45RO  and  CD25  were  both 
associated  with  the  presence  of  CD3+,  CD4+  and  CD8+  cells. 
0X40  showed  a  similar  trend,  but  this  did  not  reach  statistical 
significance,  likely  due  to  the  low  number  of  positive  cases. 

To  investigate  the  differentiation  state  of  tumor- infiltrating  T 
cells,  tissues  were  analyzed  for  intraepithelial  cells  expressing  TLA- 
1  and  Granzyme  B,  which  are  markers  of  CD8+  cytotoxic  T  cells 
and  NK  cells  [58-60],  A  majority  of  tumors  (66.5%,  127/191) 
were  positive  for  intraepithelial  TIA-1+  cells  (Fig.  IF),  and  about 
half  of  tumors  (45.6%,  88/193)  were  positive  for  intraepithelial 
Granzyme  B+  cells  (Fig.  1G).  There  was  a  highly  significant 
association  between  TIA-1+  and  Granzyme  B+  cells  (p<0.0001). 
Moreover,  in  pair-wise  comparisons,  TIA-1+  and  Granzyme  B+ 
cells  were  each  associated  with  the  activation  markers  CD45RO, 
CD25  and  0X40.  Finally,  TIA-1+  and  Granzyme  B+  cells  were 
each  associated  with  the  presence  of  CD3+,  CD4+  and  CD8+  cells 
(Table  2).  To  examine  whether  TIA-1  and  Granzyme  B 
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Figure  1.  Immunohistochemical  analysis  of  high-grade  serous  EOC  tumors  showing  infiltrates  expressing  markers  of  T  cell 
differentiation  and  activation.  (A)  CD4,  (B)  CD8,  (C)  CD45RO,  (D)  0X40,  (E)  CD25,  (F)  TIA-1,  (G)  Granzyme  B,  and  (H)  FoxP3.  40X  objective. 
doi:1 0.1 371/journal.pone.000641 2.g001 
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Table  2.  p-values  for  Chi-square  tests  of  associations  between  immunohistochemical  markers  in  the  optimally  debulked  high- 
grade  serous  EOC  cohort. 


CD3 

CD8 

CD4 

CD45R0 

CD25 

0X40 

TIA-1 

GrB1 

CD3 

<0.0001 

<0.0001 

<0.0001 

<0.0001 

0.16 

<0.0001 

<0.0001 

CD8 

<0.0001 

<0.0001 

<0.0001 

<0.0001 

0.15 

<0.0001 

<0.0001 

CD4 

<0.0001 

<0.0001 

0.0021 

<0.0001 

0.049 

0.0013 

0.0001 

CD45RO 

<0.0001 

<0.0001 

0.0021 

<0.0001 

0.031 

<0.0001 

<0.0001 

CD25 

<0.0001 

<0.0001 

<0.0001 

<0.0001 

0.0040 

<0.0001 

<0.0001 

0X40 

0.16 

0.15 

0.049 

0.031 

0.0040 

0.012 

0.044 

TIA-1 

<0.0001 

<0.0001 

0.0013 

<0.0001 

<0.0001 

0.012 

<0.0001 

GrB1 

<0.0001 

<0.0001 

0.0001 

<0.0001 

<0.0001 

0.044 

<0.0001 

FoxP3 

<0.0001 

<0.0001 

<0.0001 

<0.0001 

<0.0001 

0.47 

<0.0001 

<0.0001 

MHC  I2 

<0.0001 

<0.0001 

<0.0001 

<0.0001 

<0.0001 

0.18 

<0.0001 

0.0001 

MHC  II3 

0.0078 

0.0009 

0.013 

0.0004 

0.0001 

0.38 

0.0006 

0.016 

CD20 

<0.0001 

<0.0001 

0.0006 

0.0036 

0.0006 

0.13 

<0.0001 

<0.0001 

CDIa 

0.11 

0.096 

0.046 

0.21 

0.27 

0.030 

0.31 

0.48 

CD68 

<0.0001 

0.0001 

0.0028 

0.0019 

0.0006 

0.23 

0.0005 

0.0007 

MPO4 

0.89 

0.53 

0.034 

0.90 

0.78 

0.33 

0.67 

0.49 

COX-25 

0.75 

0.64 

0.55 

0.64 

0.74 

0.76 

0.42 

0.70 

FoxP3 

MHC  I2 

MHC  II3 

CD20 

CDIa 

CD68 

MPO4 

COX25 

CD3 

<0.0001 

<0.0001 

0.0078 

<0.0001 

0.11 

<0.0001 

0.89 

0.75 

CD8 

<0.0001 

<0.0001 

0.0009 

<0.0001 

0.096 

0.0001 

0.53 

0.64 

CD4 

<0.0001 

<0.0001 

0.013 

0.0006 

0.046 

0.0028 

0.034 

0.55 

CD45RO 

<0.0001 

<0.0001 

0.0004 

0.0036 

0.21 

0.0019 

0.90 

0.64 

CD25 

<0.0001 

<0.0001 

0.0001 

0.0006 

0.27 

0.0006 

0.78 

0.74 

0X40 

0.47 

0.18 

0.38 

0.13 

0.030 

0.23 

0.33 

0.76 

TIA-1 

<0.0001 

<0.0001 

0.0006 

<0.0001 

0.31 

0.0005 

0.67 

0.42 

GrB1 

<0.0001 

0.0001 

0.016 

<0.0001 

0.48 

0.0007 

0.49 

0.70 

FoxP3 

<0.0001 

<0.0001 

0.0009 

0.35 

0.0001 

0.42 

0.82 

MHC  I2 

<0.0001 

<0.0001 

0.0015 

0.13 

0.035 

0.61 

0.17 

MHC  II3 

<0.0001 

<0.0001 

0.023 

0.11 

0.22 

0.39 

0.30 

CD20 

0.0009 

0.0015 

0.023 

0.87 

0.029 

0.60 

0.13 

CDIa 

0.35 

0.13 

0.11 

0.87 

0.52 

0.046 

0.84 

CD68 

0.0001 

0.035 

0.22 

0.029 

0.52 

0.43 

0.30 

MPO4 

0.42 

0.61 

0.39 

0.60 

0.046 

0.43 

0.51 

COX-25 

0.82 

0.17 

0.30 

0.13 

0.84 

0.30 

0.51 

’GrB  =  Granzyme  B. 

2MHC  I  =  MHC  class  I. 

3MHC  II  =  MHC  class  II. 

4MP0  =  myeloperoxidase. 

5COX-2  =  Cyclooxygenase-2. 
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expression  could  be  due  to  the  presence  of  NK  or  NKT  cells,  we 
examined  tumors  for  the  NK  cell  markers  CD56  and  CD57.  For 
both  markers,  there  were  either  few  or  no  infiltrates  at  all  within 
the  tumor  epithelium  (data  not  shown),  indicating  that  the  TLA-1  + 
and  Granzyme  B+  infiltrates  were  most  likely  T  cells. 

Finally,  tumors  were  analyzed  for  the  presence  of  intraepithelial 
cells  expressing  FoxP3,  which  in  humans  is  a  marker  of  regulatory 
T  cells  and  activated  T  cells  [61,62].  About  half  of  tumors  (52.9%, 
100/189)  were  positive  for  intraepithelial  FoxP3+  cells  (Fig.  1H). 
There  was  a  strong  association  between  FoxP3+  and  CD25+  cells 
(p<0.0001),  and  FoxP3+  and  CD25+  cells  were  each  strongly 
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associated  with  CD4+  cells  (p<0.0001  for  both  markers).  Thus, 
consistent  with  previous  reports  [10,14,27,28],  a  significant 
proportion  of  tumors  contained  intraepithelial  infiltrates  with 
markers  characteristic  of  Tregs  (CD4+,CD25+,  and  FoxP3+). 

MHC  class  I  and  II  in  high-grade  serous  EOC 

We  analyzed  tumor  cells  for  expression  of  MF1C  class  I  and  II 
using  a  four-point  scale  (negative,  focal  [<10%],  patchy  [10-50%] 
or  diffuse  [>50%]).  All  evaluable  tumors  (185/185)  expressed 
MHC  class  I  to  some  degree  (i.e.,  focal,  patchy  or  diffuse),  indicating 
they  could  theoretically  present  antigen  to  CD8+  T  cells.  For 
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Figure  2.  Immunohistochemical  analysis  of  high-grade  serous  EOC  tumors  showing  (A,B)  high  and  low  expression  of  MHC  class  I, 
(C,D)  high  and  low  expression  of  MHC  class  II,  and  (E,F)  high  and  low  expression  of  COX-2.  40X  objective. 
doi:1 0.1 371/journal.pone.000641 2.g002 


statistical  analyses,  only  the  highest  category  (diffuse,  >50%)  was 
considered  positive  (Fig.  2A&B).  Using  this  threshold,  85.4%  (158/ 
185)  of  tumors  were  positive  for  MHC  class  I.  MHC  class  I  was 
positively  associated  with  all  three  T  cell  subsets  (CD3,  CD4,  and 
CD8),  the  activation  markers  CD45RO  and  CD25,  and  the 
differentiation  markers  TIA-1,  Granzyme  B  and  FoxP3  (Table  2). 

A  large  majority  of  tumors  (86.5%,  166/192)  expressed  MHC 
class  II  to  some  degree  (i.e.,  focal,  patchy  or  diffuse),  indicating  they 
could  theoretically  present  antigen  to  CD4+  T  cells.  As  with  MHC 
class  I,  only  the  highest  category  (diffuse,  >50%)  was  considered 
positive  for  statistical  analyses  (Fig.  2  C&D).  Using  this  threshold, 
41.1%  (79/192)  of  tumors  were  positive  for  MHC  class  II.  MHC 
class  II  was  strongly  associated  with  MHC  class  I  (p<0.0001). 
Accordingly,  MHC  class  II  was  positively  associated  with  all  three  T 
cell  subsets  (CD  3,  CD4,  and  CD8),  the  activation  markers  CD45RO 

.  PLoS  ONE  |  www.plosone.org 


and  CD25,  and  the  differentiation  markers  TIA-1,  Granzyme  B  and 
FoxP3  (Table  2).  Similar  to  the  results  for  MHC  class  I,  the 
expression  of  MHC  class  II  in  tumor  epithelium  was  positively 
associated  with  various  T  cell  markers,  including  CD3,  CD4,  CD8, 
CD45RO,  TIA-1,  Granzyme  B,  CD25  and  FoxP3  (Table  2). 

Intraepithelial  B  cells  in  high-grade  serous  EOC 

Tissues  were  stained  with  an  antibody  to  CD20,  which  is 
expressed  by  B  cells  from  the  naive  to  memory  stages  of 
differentiation  [63].  Intraepithelial  CD20+  cells  were  found  in 
41.9%  (83/198)  of  evaluable  tumors  (Fig.  3A).  CD20+  infiltrates 
were  strongly  associated  with  all  three  T  cell  subsets  (CD3,  CD4, 
and  CD8);  the  activation  markers  CD45RO  and  CD25;  the 
differentiation  markers  TIA-1,  Granzyme  B  and  FoxP3;  and  both 
MHC  class  I  and  II  (Table  2). 
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Figure  3.  Immunohistochemical  analysis  of  high-grade  serous  EOC  tumors  showing  infiltrates  expressing  (A)  CD20  (B  cells),  (B) 
CDIa  (immature  DCs),  (C)  Myeloperoxidase  (granulocytes),  and  (D)  CD68  (macrophages).  40X  objective. 
doi:1 0.1 371  /journal. pone.000641 2.g003 


Intraepithelial  dendritic  cells,  granulocytes  and 
macrophages  in  high-grade  serous  EOC 

Tumors  were  analyzed  for  the  presence  of  immature  and 
mature  dendritic  cells  by  staining  for  CD  la  and  CD208, 
respectively.  A  minority  of  tumors  (13.4%,  23/172)  contained 
intraepithelial  CDla+  cells  (Fig.  3B).  No  significant  association 
with  any  of  the  intraepithelial  lymphocyte  markers  (CD3,  CD8  or 
CD20),  activation  markers  (CD45RO  or  CD25),  differentiation 
markers  (TIA-1,  Granzyme  B  or  FoxP3)  or  MHC  class  I  or  II 
(Table  2)  was  seen,  potentially  due  to  the  low  number  of  CDla+ 
cells.  In  contrast  to  CD  la,  none  of  the  tumors  scored  positive  for 
intraepithelial  CD208+  cells.  Parallel  analysis  of  tonsil  tissue 
revealed  the  presence  of  many  CD208+  cells,  thereby  validating 
the  IHC  procedure. 

About  half  of  tumors  (54.7%,  87/159)  contained  intraepithelial 
cells  expressing  the  macrophage  marker  CD68  (Fig.  3D).  CD68 
was  positively  associated  with  several  lymphocyte  markers  (CD3, 
CD8  and  CD20),  activation  markers  (CD45RO  and  CD25), 
differentiation  markers  (TIA-1,  Granzyme  B  and  FoxP3)  and 
MHC  class  I  (Table  2).  To  assess  the  presence  of  granulocytes,  the 
TMA  was  stained  for  myeloperoxidase.  Twenty  four  percent  (37/ 
154)  of  tumors  contained  myeloperoxidase-expressing  cells 
(Fig.  3C),  however  these  showed  no  significant  associations  with 
other  markers,  with  the  exception  of  CD4  (p  =  0.034). 

The  COX-2  enzyme  has  been  associated  with  inferior  survival 
in  EOC  when  expressed  in  the  epithelial  component  of  the  tumor 
[53].  Therefore,  tumors  were  scored  for  expression  of  COX-2  in 
the  epithelial  component  using  a  four-point  scale  (negative, 
equivocal  [0-1  %],  patchy  [1-50%]  or  diffuse  [>50%]) 
(Fig.  2E&F).  Two-thirds  of  tumors  (66.5%,  1 1 1/167)  were  positive 


for  COX-2  (i.e.,  patchy  or  diffuse  staining).  In  contrast  to  reports 
in  ovarian,  cervical,  and  other  cancers,  [13,64,65],  the  expression 
of  COX-2  was  not  significantly  associated  with  any  of  the  immune 
infiltrates  studied  (Table  2). 

Associations  between  immune  infiltrates  and  patient 
survival  in  high-grade  serous  EOC 

Kaplan-Meier  analysis  was  performed  to  assess  the  association 
between  various  immune  infiltrates  and  disease-specific  survival 
(DSS).  Consistent  with  prior  reports  [5-7,10-17],  intraepithelial 
CD3+  and  CD8+  cells  were  associated  with  increased  DSS 
(p  =  0.0009  and  0.0008  respectively)  (Fig.  4A&B).  Intraepithelial 
CD4+  cells  showed  a  trend  towards  increased  DSS,  but  this  was 
not  statistically  significant  (Fig.  4C).  The  NK  cell  markers  CD56 
and  CD57  showed  no  association  with  DSS  (data  not  shown). 

Intriguingly,  intraepithelial  CD20+  cells  were  associated  with 
increased  DSS  (p  =  0.0033)  (Fig.  4E).  Furthermore,  the  combina¬ 
tion  of  CD8+  and  CD20+  infiltrates  was  associated  with 
significantly  increased  DSS  over  tumors  that  contained  CD8+ 
infiltrates  but  not  CD20+  infiltrates  (median  4432  days  vs.  2279 
days,  p  =  0.01 15)  (data  not  shown). 

In  contrast  to  lymphocyte  markers,  the  dendritic  cell  marker 
CD  la  showed  no  association  with  DSS,  possibly  due  to  the  low 
number  of  tumors  containing  CDla+  cells.  Likewise,  the  markers 
CD68,  COX-2  and  myeloperoxidase  showed  no  association  with 
DSS  (Fig.  4F  and  data  not  shown). 

Given  the  association  between  CD8+  T  cells  and  DSS,  we 
evaluated  other  canonical  features  of  active  CTL  responses.  DSS 
was  positively  associated  with  intraepithelial  TIA-1  +  cells 
(p  =  0.0003),  as  well  as  expression  of  MHC  class  I  and  II  by 
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Figure  4.  Immune  infiltrates  and  survival  in  ovarian  cancer.  Kaplan-Meier  curves  showing  disease-specific  survival  for  patients  scored  as 
positive  or  negative  for  (A)  CD3,  (B)  CD8,  (C)  CD4,  (D)  FoxP3,  (E)  CD20,  (F)  CD68,  (G)  TIA-1,  (FI)  Granzyme  B,  (I)  MHC  Class  I  and  (J)  MHC  Class  II.  Data 
were  derived  from  optimally  debulked  patients  with  high-grade  serous  EOC. 
doi:1 0.1 371  /journal.pone.000641 2.g004 
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tumor  cells  (p  —  0.0014  and  0.0026  respectively)  (Fig.  4G,I,&J). 
Tumors  that  contained  both  CD8+  and  TIA-1+  infiltrates  were 
associated  with  increased  DSS  compared  to  CD8+  TIA-1 -negative 
tumors  (p  =  0.0025).  Several  other  T  cell  markers,  including 
Granzyme  B,  CD45RO  and  CD25,  showed  trends  toward 
increased  DSS  but  did  not  reach  statistical  significance  (Fig.  4FI 
and  data  not  shown).  OX-40  showed  no  apparent  trend  or 
association  with  DSS,  possibly  due  to  low  numbers  of  positive 
cases  (data  not  shown). 

In  apparent  contrast  to  reports  that  regulatory  T  cells  are 
associated  with  poor  prognosis,  the  presence  of  intraepithelial 
FoxP3+  cells  was  associated  with  increased  DSS  (p  =  0.010) 
(Fig.  4D).  Moreover,  tumors  that  were  triply  positive  for 
intraepithelial  CD4+,  FoxP3+  and  CD25+  cells  showed  a  trend 
towards  increased  survival,  although  this  fell  short  of  statistical 
significance  (p  =  0.059).  Likewise,  tumors  positive  for  both 
intraepithelial  CD8+  and  FoxP3+  cells  showed  a  trend  toward 
increased  DSS  compared  to  tumors  that  were  positive  for  CD8+ 
cells  but  negative  for  FoxP3+  cells;  however,  this  trend  did  not 
reach  statistical  significance  (p  =  0.052).  Thus,  by  multiple 
analyses,  tumor-infiltrating  FoxP3+  cells  showed  a  trend  or 
statistically  significant  association  with  increased  DSS. 

The  association  between  immune  infiltrates  and  survival 
is  dependent  on  the  extent  of  residual  disease 

T  cell  infiltrates  are  reportedly  more  prevalent  in  patients  with 
optimal  versus  suboptimal  cytoreduction  [5,66].  To  investigate 
whether  this  was  true  for  other  lymphocyte  markers,  we  analyzed 
an  additional  cohort  of  220  high-grade  serous  cases  from  patients 
known  to  have  macroscopic  residual  disease  following  primary 
cytoreductive  surgery.  We  focused  on  CD8+  infiltrates,  as  well  as  the 
three  novel  prognostic  markers  from  the  preceding  analysis  (i.e., 
FoxP3,  TIA-1  and  CD20).  Compared  to  the  optimally  debulked 
patient  cohort,  patients  with  macroscopic  residual  disease  had  a 


significantly  lower  prevalence  of  CD8+  (58.5%),  FoxP3+  (20.2%), 
TIA-1+  (39.5%)  and  CD20+  (16.3%)  infiltrates  (p<0.0001  for  all 
markers).  In  Kaplan-Meier  analysis  of  these  four  markers,  only  CD8+ 
infiltrates  had  a  significant  association  with  survival  (p  =  0.0044)  in 
patients  with  macroscopic  residual  disease  (data  not  shown). 

The  association  between  immune  infiltrates  and  survival 
is  dependent  on  histological  subtype 

The  preceding  results  were  based  exclusively  on  high-grade 
serous  EOC  cases.  To  assess  the  association  between  immune 
infiltrates  and  DSS  in  other  histological  subtypes  of  EOC,  we 
performed  the  same  analyses  using  an  additional  288  EOC  tumors 
of  the  following  histological  subtypes:  mucinous  (n  =  31),  endome¬ 
trioid  (n=  125)  and  clear  cell  (n=  132).  These  additional  tumor 
specimens  were  from  a  previously  described  cohort  of  optimally 
debulked  patients  [12]. 

In  general,  immune  infiltrates  were  less  prevalent  in  the  other 
histological  subtypes  compared  to  the  high-grade  serous  cases 
discussed  previously.  This  was  true  for  all  lymphocyte  markers 
studied  (i.e.,  CD3,  CD8,  CD4,  CD45RO,  CD25,  FoxP3,  TIA-1, 
Granzyme  B,  and  CD20)  (Fig.  5).  The  difference  was  most  striking 
for  the  markers  FoxP3,  CD25  and  CD20.  After  the  high-grade 
serous  cases,  the  next  highest  frequency  of  immune  infiltrates  was 
seen  in  the  endometrioid  subtype  (Fig.  5). 

We  examined  the  association  between  immune  infiltrates  and 
DSS  in  the  endometrioid  and  clear  cell  subtypes;  the  number  of 
mucinous  cases  was  too  small  to  perform  robust  statistical  analysis. 
For  endometrioid  cases,  the  only  significant  association  found  was 
between  MHC  class  II  expression  and  increased  DSS  (p  =  0.039) 
(data  not  shown).  For  clear  cell  cases,  the  only  significant 
association  found  was  between  the  presence  of  myeloperoxidase- 
positive  infiltrates  and  decreased  DSS  (p  =  0.040,  data  not  shown). 
Thus,  the  relationship  between  immune  infiltrates  and  survival 
differs  greatly  between  histological  subtypes  of  EOC. 
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Figure  5.  Prevalence  of  immune  infiltrates  and  other  markers  across  different  histologic  subtypes  of  EOC.  Bars  indicate  the  percentage 
of  tumors  scoring  positive  for  intrapithelial  cells  expressing  CD3,  CD8,  CD4,  CD45RO,  CD25,  FoxP3,  TIA-1,  Granzyme  B,  CD20  and  CD68.  Expression  of 
MHC  class  I  and  II  by  tumor  epithelium  is  also  shown.  Data  were  derived  from  optimally  debulked  patients. 
doi:1 0.1 371  /journal.pone.000641 2.g005 
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Discussion 

We  systematically  examined  the  relationship  between  immune 
infiltrates  and  patient  survival  in  three  large  EOC  series.  In  accord 
with  Clarke  et  al.  [12],  we  found  that  high-grade  serous  tumors 
have  a  distinct  immunological  profile  compared  to  the  endome¬ 
trioid,  clear  cell  and  mucinous  subtypes.  Furthermore,  we  found 
that  immune  infiltrates  were  generally  more  prevalent  in  tumors 
from  patients  with  optimal  cytoreduction.  FoxP3,  TIA-1  and 
CD20  emerged  as  novel  immunological  markers  associated  with 
increased  patient  survival.  Our  results  highlight  the  importance  of 
histological  subtype  in  the  immunobiology  of  EOC,  which  may 
have  important  implications  for  the  immunotherapy  of  this  family 
of  diseases. 

Intraepithelial  lymphocytes  (i.e.,  cells  expressing  CD3,  CD4, 
CD8,  FoxP3  or  CD20)  were  more  prevalent  in  high-grade  serous 
cases,  followed  by  endometrioid  cases.  Moreover,  intraepithelial 
lymphocytes  were  more  prevalent  in  tumors  from  optimally 
debulked  patients  compared  to  patients  with  macroscopic  residual 
disease.  A  number  of  biological  features  of  tumors  appear  to 
influence  the  density  of  lymphocytic  infiltrates,  (a)  T  cell  infiltrates 
are  positively  associated  with  expression  of  MHC  class  I  and  II  by 
tumor  cells  (Table  2),  as  well  as  MHC  class  I  antigen  processing 
machinery  [15-17,67],  suggesting  that  antigen  presentation  may 
be  an  important  determinant  of  T  cell  infiltration,  (b)  In  accord 
with  diis  notion,  tumors  with  loss  or  mutation  of  the  BRCA1  or 
p53  genes  have  an  increased  density  of  tumor-infiltrating  T  cells 
[12,66].  This  suggests  that  defective  DNA  repair  and  the  ensuing 
genomic  instability  in  tumors  may  lead  to  the  generation  of  neo¬ 
antigens  that  trigger  host  T  cell  responses,  (c)  Signaling  molecules 
also  play  a  role,  as  the  density  of  tumor-infiltrating  T  cells  is 
negatively  associated  with  expression  of  VEGF,  B7-H1/PD-E1 
and  endothelin  B  receptor  by  tumors  [5,11,68]  and  positively 
associated  with  expression  of  the  chemokines  CXCL9,  CCE21, 
CCF22,  CCF2  and  CCE5  [5,28,69],  (d)  Finally,  two  groups  have 
reported  gene  expression  profiles  that  correlate  with  the  presence 
of  tumor-infiltrating  T  cells  in  EOC  [12,70].  These  latter  studies 
confirm  some  of  the  above  associations  (e.g.,  MHC  class  I  and  II, 
beta  2  microglobulin,  TAPI  and  2)  and  identify  new  factors 
associated  with  T  cell  infiltrates  (e.g.,  IL-15,  IE-32  and  numerous 
interferon-induced  genes).  Presumably  one  or  more  of  the  above 
factors  accounts  for  the  observed  enrichment  of  tumor-infiltrating 
lymphocytes  in  high-grade  serous  and  optimally  cytoreduced 
cases. 

Although  the  association  between  intraepithelial  CD8+  T  cells 
and  increased  survival  in  EOC  is  a  highly  reproducible  finding 
[10-17],  relatively  little  is  known  about  the  functional  phenotype  of 
these  CD8+  T  cells.  Several  lines  of  evidence  suggest  a  classic 
cytolytic  response  underlies  favorable  outcomes.  For  example, 
others  have  reported  positive  associations  between  survival  and 
intratumoral  expression  of  IFN-  y  [18,19],  the  IFN-  y  receptor 
[20],  as  well  as  numerous  interferon-responsive  genes  such  as 
MHC  class  I  [24-26],  MHC  class  I  antigen  processing  machinery 
[17],  MHC  class  II  [15,16],  and  IRF-1  [21].  IL-18  [22]  and  TNF- 
a  [23]  also  appear  to  be  important  components  of  the  T  cell 
response,  as  both  cytokines  are  positively  associated  with  survival. 
We  examined  two  components  of  cytolytic  granules,  Granzyme  B 
and  TIA- 1 ,  both  of  which  showed  an  association  with  CD8+  T  cell 
infiltrates.  Of  these  two  markers,  only  TIA- 1  showed  a  statistically 
significant  association  with  survival  in  high-grade  serous  cases 
(Fig.  4).  TIA-1+  cells  have  also  been  described  in  medullary  breast 
cancer  [71,72]  and  melanoma  [73],  where  they  are  associated  with 
favorable  prognostic  features.  By  contrast,  tumor-infiltrating  TIA- 
1+  cells  are  associated  with  decreased  survival  in  lymphoma  [74- 


78].  Interestingly,  TIA-1  is  not  simply  a  marker  of  cytolytic 
granules;  it  is  an  RNA  binding  protein  involved  in  post- 
transcriptional  mRNA  regulation  [79].  It  remains  to  be  deter¬ 
mined  whether  the  association  between  intraepithelial  TIA-1+ 
cells  and  survival  in  EOC  is  due  to  the  role  of  this  protein  in 
cytolytic  granule  function  or  mRNA  regulation. 

Treg  infiltrates  have  previously  been  associated  with  decreased 
survival  in  ovarian  EOC  [10,27,28].  However,  in  the  present  study 
and  one  other  recent  report  [14],  FoxP3+  infiltrates  were 
associated  with  increased  survival.  These  seemingly  contradictory 
findings  may  be  attributable  to  several  factors.  First,  not  all  studies 
take  into  consideration  the  histological  subtypes  of  EOC,  or  the 
extent  of  residual  disease;  in  the  present  study,  FoxP3+  cells  were 
only  associated  with  survival  in  high-grade  serous  tumors  from 
optimally  debulked  patients.  Second,  a  variety  of  antibodies  have 
been  used  to  detect  FoxP3,  which  can  lead  to  discordant  results 
[80].  Third,  different  scoring  criteria  may  be  used.  For  example, 
the  precise  intratumoral  location  of  Tregs  is  an  important 
determinant  of  prognosis  in  gastric  cancer  [81].  Fourth,  the 
molecular  markers  used  to  define  Tregs  differ  between  studies. 
Although  FoxP3  is  still  regarded  as  the  most  reliable  marker  of 
Tregs  in  human  cancer  [82,83],  it  can  also  be  expressed  by 
epithelial  tumor  cells  [84-86]  and  in  vitro  activated  CD4+  and 
CD8+  T  cells  [87-95].  For  these  reasons,  some  studies  include 
CD25  as  a  second  marker  of  Tregs  [10,28].  However,  like  FoxP3, 
CD25  is  potentially  expressed  by  effector  T  cells,  so  it  is  not  clear 
that  dual  staining  for  FoxP3  and  CD25  more  accurately  identifies 
Tregs  [89,96].  Other  characteristics  of  Tregs  include  high 
expression  of  GITR  and  CTLA-4  and  low  expression  of  CD  127 
and  CD49d  and  [97,98],  however  these  markers  are  technically 
difficult  to  assess  on  paraffin-embedded  TMAs. 

These  technical  considerations  notwithstanding,  there  is  mount¬ 
ing  evidence  that  tumor-infiltrating  FoxP3+  cells  are  associated 
with  a  favorable  prognosis  in  EOC,  colorectal  cancer,  head  and 
neck  cancer,  and  lymphoma  [14,33,-36,99-102].  How  might 
FoxP3+  T  cells  promote  favorable  outcomes?  In  the  present  study, 
FoxP3+  cells  were  strongly  associated  with  other  effector  T  cells, 
and  similar  results  have  been  reported  in  melanoma  [103].  Thus, 
FoxP3+  cells  may  simply  be  an  indicator  of  a  strong  CD8+  T  cell 
response,  which  might  outweigh  any  immunosuppressive  effects  of 
FoxP3+  cells.  Alternatively,  subsets  of  human  FoxP3+  T  cells  have 
recently  been  shown  to  have  a  pro-inflammatory,  IL-17-producing 
phenotype  [104-106].  Indeed,  CD4+  T  cells  can  be  skewed  toward 
this  so-called  Thl7  phenotype  by  exposure  to  TGF-P  in 
combination  with  IL-6,  IL-1  or  IL-23  [107-109].  These  factors 
are  present  in  the  EOC  tumor  environment  [8],  and  accordingly, 
Thl7  cells  have  been  reported  in  EOC  [110-112].  Thus,  the 
association  between  FoxP3+  cells  and  increased  survival  could 
potentially  reflect  an  underlying  Till  7-like  anti-tumor  response. 
Clearly,  more  work  is  required  to  determine  the  extent  to  which 
FoxP3+  T  cells  in  EOC  represent  Tregs  versus  Thl7  or  other 
effector  T  cells. 

The  observation  that  intraepithelial  CD20+  infiltrates  are 
associated  with  increased  DSS  is  a  novel  finding  in  EOC.  Dong 
et.  al.  reported  that  B  cells  in  ascites  were  associated  with  shorter 
survival  in  EOC  [1 13],  however  their  study  focused  on  B  cells  in 
peritoneal  and  pleural  effusions  collected  after  chemotherapy, 
which  by  definition  constitutes  a  poor  outcome  cohort.  Indeed,  in 
the  present  study,  intraepithelial  CD20+  B  cells  showed  no 
association  with  survival  in  patients  with  high-risk,  suboptimally 
debulked  disease.  Tumor-infiltrating  CD20+  B  cells  are  a  hallmark 
of  medullary  breast  cancer  and  have  been  proposed  to  mediate  a 
favorable  prognosis  [1 14, 1 15].  Moreover,  the  presence  of  a  B  cell 
transcriptional  signature  in  node-negative  breast  cancer  is 
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associated  with  increased  survival  [116].  B  cell  infiltrates  in  breast 
cancer  represent  clonally  expanded  populations,  express  somati¬ 
cally  hypermutated  IgG  molecules,  and  recognize  target  antigens 
such  as  ganglioside  D3  and  surface-translocated  actin 
[114,115,117-120].  It  is  unclear  how  tumor-infiltrating  B  cells 
promote  favorable  outcomes  in  cancer.  In  theory,  their  actions 
could  be  mediated  by  secreted  antibodies,  which  can  promote  the 
opsonization  of  tumor  antigens,  complement-mediated  destruction 
of  tumor  cells,  or  antibody-dependent  cellular  cytotoxicity.  Apart 
front  producing  antibodies,  B  cells  can  also  present  antigen  to  both 
CD4+  and  CD8+  T  cells  [121-131].  In  this  regard,  it  is  noteworthy 
diat  ovarian  tumors  show  low  numbers  of  CDla+  dendritic  cells; 
perhaps  CD20+  B  cells  serve  as  alternative  antigen  presenting  cells 
in  the  tumor  environment.  This  latter  idea  fits  well  with  the 
observed  co-localization  of  tumor-infiltrating  B  cells  and  CD8+  T 
cells  in  EOC,  as  well  as  in  medullary  breast  cancer,  non-small  cell 
lung  cancer  and  cervical  cancer  [72,132-134], 

The  presence  of  macrophages  has  been  associated  with  poor 
prognosis  in  various  human  cancers  [135,136].  However,  in 
accord  with  a  prior  report  by  Shah  et.  al.  [66],  we  found  no 
association  between  CD68+  infiltrates  and  survival  in  EOC. 
Importantly,  however,  CD68  is  not  a  perfect  marker  of 
macrophages,  as  it  is  also  expressed  by  dendritic  cells  and  some 
non-myeloid  cells  [137],  Furthermore,  CD68  does  not  distinguish 
between  macrophages  polarized  towards  the  pro-inflammatory 
(Ml)  or  tumor-promoting  (M2)  phenotypes.  Ml  macrophages 
have  the  capacity  to  kill  tumor  cells,  whereas  M2  macrophages 
promote  tissue  repair  and  angiogenesis  [136].  Similarly,  an 
immunosuppressive  subpopulation  of  macrophages  has  been 
described  in  EOC  based  on  expression  of  the  signaling  molecule 
B7-H4  [52].  Thus,  additional  functional  markers  may  be  required 
to  fully  define  the  role  of  macrophages  in  the  immunobiology  of 
EOC. 

While  this  study  focused  on  the  relationship  between  immune 
infiltrates  and  prognosis  after  standard  treatments,  the  results  may 
also  inform  the  design  of  immunotherapies  for  EOC.  First,  our 
findings  suggest  that  high-grade  serous  tumors  may  be  especially 
sensitive  to  T  cell  responses.  Second,  our  data  indicates  that,  in 
patients  with  residual  disease,  the  influence  of  T  cells  may  be 
overwhelmed  by  other  factors.  Third,  the  positive  association 
between  intraepithelial  FoxP3+  cells  and  survival  reported  here 
and  previously  [14]  prompts  a  reconsideration  of  strategies  to 
deplete  regulatory  T  cells  from  EOC  patients.  And  fourth,  the 
association  between  intraepithelial  CD20+  cells  and  survival 
suggests  the  humoral  immune  response  may  play  an  important 
role  in  anti-tumor  immunity  that  could  be  exploited  therapeuti¬ 
cally  in  parallel  with  CD8+  T  cell  responses. 

Materials  and  Methods 

Study  subjects 

All  specimens  and  clinical  data  were  obtained  with  informed 
written  consent  under  protocols  approved  by  the  Research  Ethics 
Board  of  die  BC  Cancer  Agency  and  the  University  of  British 
Columbia.  The  main  cohort  used  for  this  study  consisted  of  199 
women  with  high-grade  serous  ovarian  cancer  seen  at  the  BC 
Cancer  Agency  from  1984  to  2000  (OvCaRe  Ovarian  Tumour 
Bank,  Vancouver,  BC,  Canada).  Tumor  tissue  was  obtained  at  the 
time  of  primary  surgery  prior  to  any  other  treatment.  Patients  had 
no  macroscopic  residual  disease  following  surgical  debulking.  All 
patients  underwent  standard  treatment  consisting  of  surgery 
followed  by  standard  platinum-based  chemotherapy.  Table  1 
shows  the  general  clinical  characteristics  of  the  199-case  cohort. 
We  also  analyzed  a  second  cohort  of  mucinous  (n  =  31), 


endometrioid  (n=125)  and  clear  cell  (n=132)  EOC  cases. 
Patients  in  this  cohort  were  also  diagnosed  from  1984  to  2000, 
were  optimally  cytoreduced,  and  received  platinum-based  chemo¬ 
therapy.  Finally,  we  analyzed  a  third  cohort  of  220  high-grade 
serous  EOC  patients  categorized  as  extreme  risk  due  to  the 
presence  of  residual  macroscopic  disease.  Patients  in  this  cohort 
were  treated  from  1996  to  2000  and  received  platinum-based 
chemotherapy. 

Tumor  specimens 

Tumor  tissue  was  obtained  during  primary  cytoreductive 
surgery,  fixed  in  10%  neutral  buffered  formalin,  processed  using 
standard  procedures  and  embedded  in  paraffin.  A  tissue  micro¬ 
array  (TMA)  was  constructed  by  taking  duplicate  0.6  mm  cores 
from  representative  regions  of  each  tumor  block  after  review  of 
hematoxylin-  and  eosin-stained  sections  by  a  pathologist.  TMAs 
were  assembled  using  a  Pathology  Devices  tissue  arrayer 
(Westminster,  MD). 

Immunohistochemistry 

Immunohistochemistry  for  CD20,  CD3,  CD4,  CD8  and 
Granzyme  B  was  performed  as  described  in  Clarke  et  al.[12]. 
The  remaining  unstained  slides  were  received  at  the  Trev  and 
Joyce  Deeley  Research  Centre  where  immunohistochemistry  was 
performed  for  CD45RO,  TIA-1,  FoxP3,  CD25,  OX-40,  CD56, 
CD57,  CD  la,  CD208,  myeloperoxidase,  CD68,  COX-2,  MHC 
Class  I  and  MHC  Class  II.  Following  deparaffinization,  the  slides 
were  placed  in  a  Ventana  Discovery  XT  autostainer  (Ventana, 
Tucson,  AZ)  for  immunohistochemical  staining.  Ventana’s 
standard  CC1  protocol  was  used  for  antigen  retrieval.  Primary 
antibodies  are  listed  in  Table  3. 

TMAs  were  incubated  with  primary  antibodies  for  60  minutes 
at  room  temperature,  and  the  appropriate  cross-adsorbed, 
biotinylated  secondary  antibody  Jackson  Immunoresearch,  West 
Grove,  PA)  was  applied  for  32  minutes.  Bound  antibodies  were 
detected  using  the  DABMap  kit  (Ventana),  counterstained  with 
hematoxylin  (Ventana),  and  coverslipped  manually  with  Cytoseal- 
60  (Richard  Allan,  Kalamazoo,  MI). 

Histopathological  analysis 

Immunostained  TMAs  were  examined  by  a  pathologist  and 
scored  using  a  variety  of  methods  depending  on  the  marker 
studied.  For  CD20,  CD8,  CD4,  CD45RO,  TIA-1,  Granzyme  B, 
CD25,  0X40,  CD  la,  CD56,  CD57,  and  myeloperoxidase,  only 
cells  residing  within  the  epithelial  compartment  of  the  tumor  were 
counted.  FoxP3  was  similarly  scored  within  the  epithelium  but  a 
stromal  score  was  also  obtained.  Tumors  were  scored  as  0  (no 
cells),  1  (1-5  cells),  2  (6-19  cells)  or  3  (20+  cells);  results  were 
binarized  as  positive  (IHC  score  1,  2,  or  3)  or  negative  (IHC  score 
0).  CD3  was  scored  as  0  (no  cells  present),  1  (cells  present  in  stroma 
only),  2  (cells  present  in  the  epithelial  compartment)  or  3  (cells 
present  in  both  the  epithelial  and  stromal  regions  of  the  tumor); 
scores  of  0  and  1  were  reported  as  negative  while  a  score  of  2  or  3 
was  reported  as  positive.  CD68  was  scored  as  0  (no  cells  present),  1 
(luminal  or  stromal  cells),  2  (scattered,  <20  intraepithelial  cells),  or 
3  (>20  intraepithelial  cells),  results  were  binarized  in  the  same 
manner  as  CD3.  COX-2  was  scored  as  0  (negative),  1  (equivocal, 
0-1%),  2  (patchy,  >1%  to  50%)  or  3  (diffuse,  >50%)  scores  of  0 
and  1  were  reported  as  negative  and  scores  of  2  or  3  were  reported 
as  positive.  For  MHC  class  I  and  II,  samples  were  scored  as  0 
(negative),  1  (focal,  <10%),  2  (patchy,  10-50%)  or  3  (diffuse, 
>50%).  Scores  of  0,  1  or  2  were  reported  as  negative  and  scores  of 
3  were  reported  as  positive. 
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Table  3.  Primary  antibodies  used  for  immunohistochemistry. 


Antigen 

Clone 

Supplier 

Source 

Concentration 

CD3 

Polyclonal 

Cell  Marque 

Rabbit 

1/300 

CD8 

C8/144C 

DAKO 

Rabbit 

1/50 

CD4 

4B12 

Novocastra 

Mouse 

1/50 

Granzyme  B 

GrB-7 

DAKO 

Mouse 

1/25 

CD45RO 

UCHL-1 

Lab  Vision 

Mouse 

1/300 

TIA-1 

TIA-1 

Abeam 

Mouse 

1/50 

FoxP3 

eBio7979 

eBioscience 

Mouse 

1/50 

CD25 

4C9 

Lab  Vision 

Mouse 

1/40 

OX-40  (CD!  34) 

ACT35 

BD  Pharmingen 

Mouse 

1/50 

CD20 

L26 

DAKO 

Mouse 

1/250 

CD56 

123C3.D5 

Lab  Vision 

Mouse 

1/50 

CD57 

NK1 

Lab  Vision 

Mouse 

1/200 

CDIa 

O10 

Lab  Vision 

Mouse 

1/50 

CD208 

1010E1.01 

Imgenix 

Rat 

1/50 

Myeloperoxidase 

Polyclonal,  Catalogue  #  RB-373 

Lab  Vision 

Rabbit 

1/200 

CD68 

PG-M1 

Lab  Vision 

Mouse 

1/50 

Cyclooxygenase-2  (COX-2) 

SP21 

Cell  Marque 

Rabbit 

1/10 

MHC  class  1  (A,  B,  C) 

EMR8-5 

MBL 

Mouse 

1/500 

MHC  class  II  (DR,  DP  &  DQ) 

CR3/43 

Affinty  BioReagents 

Mouse 

1/50 

doi:  1 0.1 371  /journal.pone.000641 2.t003 


Statistical  analysis 

Statistical  analysis  was  performed  using  JMP  statistical  software 
(v7.0)  (SAS  Institute,  Cary,  NC).  Univariate  analysis  was  carried 
out  using  the  Chi-Squared  statistic.  The  log-rank  test  was  used  to 
compare  Kaplan-Meier  curves,  /i-values  less  than  0.05  were 
considered  significant. 
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The  adaptive  immune  system  can  protect  against  spontaneously  arising  tumors,  and  the  potential  exists  to 
reduce  cancer  incidence  by  priming  adaptive  immune  responses  with  vaccines.  Immunologic  cancer  control 
has  been  implemented  for  cancers  caused  by  infectious  agents,  but  not  for  spontaneous  cancers  caused  by 
mutation.  This  is  largely  due  to  the  high  cost  of  preventative  clinical  trials  and  the  lack  of  validated  tumor 
epitopes.  Here  we  evaluate,  computationally,  all  known  somatic  mutations  in  human  tumors  for  their 
antigenic  potential.  All  possible  human  leukocyte  antigen  (HLA)  class  I  presented  peptides  containing 
recurrent  somatic  cancer  mutations  with  frequency  >  5%  were  screened  by  three  independent  epitope 
prediction  algorithms  (SYFPEITH1,  BIMAS,  and  IEDB).  Using  stringent  filters,  a  total  of  20  genes,  35  mutations, 
and  159  candidate  epitopes  were  identified,  each  presented  by  up  to  four  distinct  HLA  class  I  alleles.  The 
top-ranking  gene  from  our  survey  was  KRAS,  which  figures  prominently  because  there  are  frequent  hotspot 
mutations  in  numerous,  prevalent  cancers,  and  mutant  peptides  are  predicted  to  be  presented  by  several 
common  HLA  alleles.  From  our  data,  we  estimate  that  prophylactic  vaccination  could  provide  meaningful 
levels  of  prevention  of  tumors  associated  with  common  recurrent  mutations. 

©  2010  American  Society  for  Histocompatibility  and  Immunogenetics.  Published  by  Elsevier  Inc.  All  rights 

reserved. 


1.  Introduction 

More  than  a  century  ago  it  was  recognized  independently  by 
Ehrlich  as  well  as  Bashford  and  colleagues  [1,2]  that  cells  isolated 
from  rodent  tumors  could  be  lethal  when  injected  into  naive  mice 
but  were  unable  to  proliferate  when  injected  at  a  new  site  in  the 
same  animal  from  which  they  were  obtained,  even  as  the  original 
tumor  continued  to  grow.  This  phenomenon  was  termed  concom¬ 
itant  immunity  and  clearly  illustrated  that  the  mammalian  im¬ 
mune  system  is  effective  in  eliminating  cancer  if  the  burden  of 
malignant  cells  is  low,  but  is  less  effective  for  established  tumors. 
Half  a  century  later,  a  series  of  definitive  experiments  by  Prehn  and 
Main  [3]  showed  that  inoculation  with  tumor,  but  not  normal 
tissue,  was  protective  and,  in  time,  the  theory  of  cancer  immuno¬ 
surveillance  was  formally  established  [4],  Recently,  it  has  been 
shown  that  Rag2_/~  mice,  which  have  no  V(D)J  recombination  and 
therefore  no  repertoire  of  mature  lymphocytes,  have  dramatically 
increased  incidence  of  spontaneous  tumors  [5],  Further,  it  has  been 
shown  that  tumors  formed  in  immunologically  permissive  Rag2~/_ 
hosts  are  more  immunogenic  when  transferred  to  wild-type  mice, 
and  that  host-immune  pressures  can  maintain  tumors  in  an  equi¬ 
librium  state  [6].  Based  on  these  observations,  the  early  theory  of 
immunosurveillance  has  been  revised  to  that  of  immunoediting  [7], 
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which  holds  that  spontaneously  arising  tumor  cells  are  frequently 
eliminated  by  the  immune  system.  Those  that  begin  to  grow  are 
held  in  a  state  equilibrium  with  host  immunity,  from  which  they 
may  eventually  escape  by  various  mechanisms  such  as  loss  of 
antigen,  loss  of  antigen  presentation  pathways,  or  interference  by 
regulatory  T  cells.  The  potential  to  use  vaccination  to  mobilize 
adaptive  immunity  against  cancer  has  been  illustrated  in  mice 
engineered  to  express,  for  example,  the  SV40  T  antigen  [8],  or 
activated  rat  Erbb2  [9-12],  In  these  animal  models,  pre-exposure 
to  antigen  can  reproducibly  provide  complete  protection  against 
tumor  development. 

In  humans,  it  is  already  well  established  that  cancers  of  viral 
origin  can  be  prevented  by  vaccination,  and  Gardasil®  (Merck  & 
Co.,  Inc.,  Whitehouse  Station,  NJ)  now  makes  this  a  clinical  reality 
for  cervical  cancer  [13],  For  tumors  of  mutational  origin,  the  prin¬ 
cipal  lines  of  evidence  for  antitumor  adaptive  immune  responses 
are  as  follows.  First,  there  are  numerous  case  reports  of  donated 
organs  in  which  occult  cancers,  initially  held  in  check  by  donor 
immunity,  undergo  rapid  and  progressive  outgrowth  after  trans¬ 
plantation  because  recipients  are  naive  to  the  tumor  antigens,  and 
immunosuppressed  [14-16],  Second,  the  only  controlled  study 
that  has  evaluated  cancer  rates  in  immunosuppressed  individuals 
(Scandinavian  kidney  transplant  recipients)  reports  a  clear  increase 
in  the  incidence  of  a  wide  variety  of  noninfectious  primary  cancers 
[17].  Third,  it  is  well  established  in  solid  tumors  that  patients  with 
detectable  tumor  infiltrating  lymphocytes  have  better  outcomes, 
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both  in  terms  of  progression  free  interval  and  overall  survival 
[18-20].  Finally,  natural  immune  responses  to  tumor-specific  an¬ 
tigens  are  detectable  in  patients  with  cancer.  Humoral  responses 
have  been  extensively  characterized  in  patient  serum  and  anti¬ 
bodies  against  common  tumor  antigens  have  been  found,  in 
some  studies,  in  more  than  half  of  cancer  patients  but  in  very  few 
healthy  control  individuals  (reviewed  in  [21]).  Natural  cellular 
immune  responses  are  more  difficult  to  assess,  and  efforts  have 
focused  on  characterization  of  tumor-infiltrating  rather  than 
circulating  lymphocytes.  CD4+  and  CD8+  tumor-reactive  T  cells 
have  been  identified  that  recognize  point  mutations  in  numer¬ 
ous  genes  such  as  fibronectin  [22],  HSP70  [23],  major  histocom¬ 
patibility  complex  class  1  [24],  N-RAS  [25],  j3-catenin  [26], 
MART-2  [27],  and  pl4ARF  [24], 

It  is  conspicuous  that  after  more  than  a  century  of  studying 
cancer  immunology  we  still  do  not  have  immunologic  cancer  con¬ 
trol  (i.e.,  vaccines)  [28]  for  the  most  common  and  deadly  cancers. 
The  reasons  for  this  include  the  fact  that  it  is  difficult  to  test  pre¬ 
ventative  strategies  against  cancer.  The  likelihood  of  any  individual 
developing  a  specific  type  of  tumor  is  relatively  low,  and  when  they 
do  appear,  they  occur  over  a  wide  age  range  such  that  preventative 
clinical  trials  require  the  involvement  of  a  large  number  of  subjects 
over  many  years.  These  trials  are,  therefore,  expensive  and  require 
extraordinary  justification.  Further,  for  most  cancers  we  do  not 
have  validated  antigens.  In  cancers  of  viral  origin,  the  human  im¬ 
mune  system  responds  to  non-self  antigens  from  the  virus.  In 
contrast,  vaccination  against  tumors  of  mutational  origin  involves 
important  and  unique  considerations.  Antigens  must  be  suffi¬ 
ciently  common  in  the  target  cancer  to  have  a  meaningful  public 
health  impact  and  they  must  be  reliably  immunogenic  (i.e.,  nontol- 
erated).  They  should  also  be  causally  involved  in  tumorigenicity 
and  confer  a  selective  advantage  necessary  for  tumor  survival,  such 
that  the  risk  of  immune  escape  by  antigen  loss  is  minimized. 

Most  oncogenic  mutations  are  incurred  in  intracellular  signal¬ 
ing  proteins.  In  nucleated  cells,  intracellular  proteins  are  cleaved 
and  a  subset  of  the  cleavage  products  are  presented  at  the  cell 
surface  by  human  leukocyte  antigen  (HLA)  class  I,  where  they  are 
subjected  to  surveillance  by  the  repertoire  of  cytotoxic  T  lympho¬ 
cytes.  Because  HLA  class  I  restricted  peptides  are  short  (—8-11 
amino  acids),  even  single  amino  acid  changes  such  as  the  hotspot 
mutations  incurred  by  common  tumor  genes  can  be  sufficient  for 
cytotoxic  T  lymphocyte  to  single  out  and  selectively  kill  cells  pre¬ 
senting  the  mutant  peptide  [29,30].  For  these  reasons  mutational 
epitopes  (rather  than  expression  or  differentiation  epitopes)  will 
likely  prove  to  be  best  suited  for  preventative  vaccines  against 
spontaneous  cancers.  As  of  this  writing,  only  39  HLA  class  I  re¬ 
stricted  mutational  epitopes  have  been  reported  and  compiled  by 
the  cancer  immunity  database  (http://www.cancerimmunity.org). 
In  this  set  there  is  over-representation  of  melanoma  epitopes  (17/ 
39)  and  epitopes  that  are  HLA-A2  restricted  (11/39).  HLA  loci, 
particularly  HLA  antigen-binding  regions,  are  the  most  highly  poly¬ 
morphic  sites  in  the  genome  with  3,477  HLA  class  I  allelic  se¬ 
quences  now  recognized  (http://hla.alleles.org/alleles/index.html). 
Different  HLA  alleles  have  different  antigen  presentation  character¬ 
istics,  such  that  presentation  of  any  given  peptide  is  often  restricted 
to,  or  strongly  favored  by,  a  specific  HLA  allele  [31  ].  HLA  restriction 
is  a  key  consideration  when  assessing  the  immunogenicity  of  on¬ 
cogenic  mutations. 

Here,  we  present  a  meta-analysis  of  predicted  HLA  class  I  re¬ 
stricted  epitopes  from  recurrent  tumor  mutations  using  estab¬ 
lished  databases  and  epitope  prediction  tools.  We  have  evaluated, 
computationally,  all  mutant  peptides  derived  from  all  known  tu¬ 
mor  mutations.  Epitopes  have  been  ranked  according  to  (1)  the 
estimated  frequency  of  mutation  in  a  given  tumor  type,  (2)  the 
incidence  of  that  tumor,  (3)  the  likelihood  of  the  mutation  produc¬ 
ing  an  HLA  presented  T-cell  epitope,  and  (4)  the  estimated  popula¬ 


tion  frequency  of  the  presenting  HLA  allele.  Because  of  the  strin¬ 
gency  of  our  analysis  our  predictions  are  very  conservative,  but 
offer  the  first  quantitative  estimate  of  the  minimum  benefit  that 
might  be  expected  from  preventative  cancer  vaccination,  and  pro¬ 
vide  a  short  list  of  the  best  predicted  epitopes  for  further  study. 

2.  Methods 

2.1.  Selection  of  candidate  tumor  mutations 

Researchers  at  the  Sanger  Institute  [32-34]  have  undertaken  the 
daunting  task  of  curating  and  cataloging  over  50,000  published 
somatic  mutations  from  almost  4,800  genes  in  250,000  tumors 
scattered  across  the  vast  cancer  literature.  Their  database,  COSMIC 
(Catalogue  of  Somatic  Mutations  in  Cancer),  tracks  each  gene  and 
somatic  change  observed,  its  frequency,  and  the  primary  tissue 
where  it  occurs.  Using  COSMIC  v43  Release  (http://www.sanger. 
ac.uk/genetics/CGP/cosmic/),  every  combination  of  primary  tissue, 
gene,  and  somatic  mutation  was  considered  for  our  study.  Only 
somatic  mutations  seen  in  5%  or  more  of  tissue  samples  and  where 
at  least  five  or  more  positives  identified  from  100  or  more  samples 
were  retained  for  further  analysis.  A  further  condition  for  tumor 
mutation  selection  was  that  the  cancer  incidence  be  known  for 
each  tissue  type  where  the  mutation  occurred.  Cancer  incidence 
data  from  the  Surveillance,  Epidemiology  and  End  Results  (SEER) 
program  (http://seer.cancer.gov),  which  is  the  definitive  source  for 
cancer  statistics  in  the  United  States,  was  used  for  this  purpose. 

First,  we  defined  a  Mutation  Impact  Score  (MIS)  as  follows: 

MIS  =  Frequency  of  mutation  in  given  tumor  x  tumor  incidence 

Each  mutation  was  considered  independently,  regardless  of 
whether  there  were  other  mutations  in  the  same  gene,  or  even  at 
the  same  codon  ( e.g .,  KRAS  G12V  and  KRAS  G12D  were  treated  as 
independent  mutations).  For  comparing  mutations  we  also  calcu¬ 
lated  a  cumulative  MIS,  which  was  the  sum  of  the  MIS  for  all  tumor 
sites  in  which  that  mutation  was  detected  (Figure  1 ). 

2.2.  Identification  of  putative  HLA  ligands 

We  used  computational  HLA-binding  prediction  tools  to  iden¬ 
tify  mutations  likely  to  be  presented  by  specific  HLA  alleles  as 
peptide  ligands  and,  by  extension,  T-cell  epitopes.  Using  web-based 
SYFPEITH1  [35]  (http://www.syfpeithi.de),  BIMAS  [36]  (http:// 
www-bimas.cit.nih.gov/molbio/hla_bind/),  and  IEDB  [37,38]  (Sta¬ 
bilized  Matrix  Method;  http://tools.immuneepitope.org/analyze/ 
html/mhc_binding.html),  we  queried  each  candidate  mutation  in  a 
systematic  and  high-throughput  fashion,  testing  all  combinations 
of  peptide  lengths  and  HLA  class  I  alleles  available  to  query  on  the 
hosting  server.  These  prediction  algorithms  use  empiric  peptide- 
HLA  binding  data  to  build  models  of  peptide  sequence  specificity, 
and  give  a  predictive  outcome  for  yet  untested  sequences.  In  these 
models,  the  amino  acids  found  at  each  position  of  short  peptides 
eluted  from  real  HLA  molecules  are  compiled  in  a  database.  For 
SYFPEITHI  and  BIMAS  a  two-dimensional  matrix  is  built  using  the 
observed  frequency  of  each  amino  acid  at  each  position  within  the 
HLA  peptide  binding  pocket.  Stabilized  matrix  method  (IEDB)  con¬ 
siders,  in  addition,  pairwise  interactions  between  amino  acid  posi¬ 
tions  and  is  thought  to  yield  more  accurate  predictions  [38].  These 
three  tools  are  the  most  heavily  used  and  highly  cited  tools  for  in 
silico  HLA-binding  predictions  [39], 

A  script  using  PERL  LWP  and  HTML:  :Form  modules  was  developed 
to  handle  high-throughput  form  submission  and  post-prediction  data 
extraction.  This  script  was  run  in  a  manner  that  minimized  impact  on 
the  hosting  service,  with  a  5-second  buffer  between  each  submis¬ 
sion.  All  overlapping  8-,  9-,  10-,  and  11-mer  mutational  peptides 
were  tested  against  all  HLA  class  I  combinations  supported  by  the 
host  servers.  For  each  peptide  containing  the  mutation  of  interest, 
the  output  in  terms  of  HLA-binding  prediction  method,  rank,  score, 


R.L.  Warren  and  R.A.  Holt /Human  Immunology  71  (2010)  245-254 


247 


KRASG12D  . 
KRASG12V 
HRASG12V 
KRASG12C 
KRASG12R 
KRASG13D  i 
MSH6  P1087fs*5 
NPM1  W288fs*12  . 
FGFR3  G697C 
CTNNB1  S45F 
TP53  R248Q 
CTNNB1  S45del 
CTNNB1  T41 A  . 

ABL1  T315I 
FGFR3  Y373C  ! 
PIK3CA  E545K 
TP53  R273H 
PDGFRA  D842V 
EGFR  L858R 
IDH 1  R132H  . 

KITD816V 
BRAF  V600E  ! 
HNF1A  P291fs*51 
FGFR3  S249C 
CTNNB1 S37A 
PIK3CAH1047R 
EGFR  E746  A750del  _ 
FGFR3  R248C 
NRASQ61K  ] 
NRASQ61R 
RET  M918T  . 
MET  Y1253D 
JAK2  V617F 
APCS1341R  . 
APC  T1 556fs*3 


■  EIS 

■  MIS 


0  20  40  60  80  100 


Fig.  1.  Sum  of  mutation  impact  score  (MIS)  and  sum  of  epitope  impact  score  (EIS)  for  each  candidate  tumor-associated  antigen  identified  by  our  study.  The  EIS,  which  takes 
into  account  the  average  HLA  frequency  in  the  population,  and  the  HLA-binding  prediction  ranking  is  high  for  all  KRAS  mutants,  which  suggests  that  their  predicted  epitopes 
would  be  expected  to  have  the  greatest  utility  for  immunologic  cancer  control. 


start  coordinate,  length,  peptide  sequence,  and  HLA  allele  were 
stored  in  a  local  custom  MySQL  database  for  future  querying.  To  be 
considered  further,  each  peptide-HLA  allele  combination  had  to  be 
found  by  at  least  two  of  the  three  epitope  prediction  algorithms  and 
the  predicted  HLA  ligand  had  to  be  within  the  top  scoring  5%  of  all 
possible  peptides  from  that  protein.  Only  HLA  class  1  coding  vari¬ 
ants  were  considered  (i.e.,  4-digit  resolution).  Candidate  epitopes 
that  passed  this  filter  were  given  an  epitope  score  (ES). 

ES  =  100/(AVG  RANK  X  EXP(STDEV/100)) 

The  ES  is  based  on  the  mean  prediction  rank  and  standard 
deviation  from  the  mean  for  two  or  more  prediction  algorithms  and 
ranges  from  0  to  1 00.  A  peptide  predicted  to  bind  to  a  given  HLA  and 
ranked  first  by  any  two  or  all  three  prediction  programs  will  score 
100  (e.g.,  KRAS  G12R  VWGARGVGK).  We  used  the  rank  as  a  metric 
because  the  epitope  prediction  scores  are  determined  differently 
by  the  three  programs  and  are  not  directly  comparable. 

Next,  we  calculated  Epitope  Impact  Score  (EIS). 

EIS  =  ES  x  average  population  frequency  of  presenting  HLA  allele 

The  EIS  considers  the  US  population  frequencies  for  each  HLA 
class  I  allele  (http://www.allelefrequencies.net).  Only  allelic  fre¬ 
quencies  with  a  sample  size  equal  to  or  greater  than  500  were  used. 
The  EIS  will  decrease  the  ES  by  a  factor  proportional  to  the  allele 
frequency  itself.  Thus,  even  high-ranking  peptides  will  have  a  low 
EIS  score  if  they  are  predicted  to  be  presented  by  rare  HLA  alleles. 
For  example,  top-ranking  KRAS  G12R  peptide  VWGARGVGK  (ES  = 
100),  predicted  to  be  presented  by  HLA  class  I  coding  variant 
A*1101  (average  population  frequency  =  7.07%)  has  a  low  EIS  of 
7.07. 

For  each  candidate  peptide-HLA  pair  we  calculated  a  global 
score  (GS)  that  takes  into  account  the  MIS  as 

GS  =  MIS  x  EIS 

And  for  each  gene,  we  calculated  an  overall  score  (OS)  which  is 
the  sum  of  the  GS  for  all  predicted  epitopes  for  that  gene  across  all 
tumor  sites  where  the  mutation  is  found. 

OS  =  2  GS 

To  estimate  the  proportion  of  the  population  that  could  benefit 
from  vaccination  with  a  combination  of  predicted  epitopes,  it  is 


necessary  to  determine  the  proportion  of  the  population  that 
would  carry  at  least  one  presenting  HLA  allele.  The  proportion  (P)  of 
the  population  that  contains  at  least  one  of  some  larger  number  of 
alleles  (K)  is  determined  as  described  in  Gulukota  and  DeLisi  [40]. 
Assuming  that  two  or  more  alleles  occur  with  correlated  probabil¬ 
ities,  the  overall  coverage  is  the  sum  of  individual  allele  coverage 
corrected  for  the  overlaps. 

fV=2V2>i/+  2 

i  =  1  pairs  triplets 

3.  Results 

We  used  the  COSMIC  database  as  a  starting  point  to  identify  the 
most  promising  tumor  mutations  for  epitope  prediction.  COSMIC 
compiles  data  on  somatic  tumor  mutations  from  the  public  scien¬ 
tific  literature,  and  it  is  the  most  comprehensive  repository  of 
curated  somatic  mutations  in  cancer  [32-34].  Using  our  selection 
criteria  for  somatic  mutations  (minimum  frequency  5%,  minimum 
of  five  positives  from  at  least  1 00  samples)  we  identified  36  distinct 
somatic  mutations  in  20  genes  and  1 9  distinct  tumor  sites  (Table  1 ). 
These  consisted  of  point  mutations  (29  cases),  indels  causing 
frameshifts  (five  cases)  and  codon  deletions  (two  cases).  These 
results  demonstrate  that  based  on  mutational  screens  undertaken 
to  date,  there  appear  to  be  relatively  few  recurrent  mutations  in 
cancer.  The  highest  scoring  mutation  was  JAK2  V617F  in  hemato¬ 
poietic/lymphoid  tumors(MIS  =  20.8),  followed  by  PIK3CAH1047R 
in  breast  tumors  (MIS  =  15.1),  and  the  NPM1  W288-frameshift  in, 
again,  hematopoietic/lymphoid  tumors  (MIS  =  8.8).  Interestingly, 
JAK2  is  prominent  because  it  is  mutated  at  high  frequency  (45.1%) 
in  a  single  category  of  common  tumors  (hematopoietic  and  lym¬ 
phoid).  NPM1,  though  also  mutated  only  in  this  same  category  of 
tumors,  ranks  lower  because  of  lower  mutation  frequency  (19%). 

Our  screen  prioritizes  genes  where  mutations  occur  at  specific 
locations  and  in  multiple  cancers.  These  are  the  candidates  where 
the  smallest  number  of  epitopes  could  have  the  greatest  potential 
impact  when  delivered  as  a  vaccine.  As  a  result,  there  are  some 
well-recognized  cancer  genes  that  do  not  figure  prominently  in  our 
results.  For  example,  TP53  is  one  of  the  most  frequently  mutated 
cancer  genes,  but  it  does  not  rank  highly  here  because  mutations 
occur  at  many  sites  throughout  the  gene  rather  than  at  one  or  a 
small  number  of  specific  hotspots.  For  the  present  study  a  low  TP53 
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Table  1 

Predicted  mutational  tumor  epitopes  from  cancer  genes  sorted  by  overall  score  for  each  gene.  The  mutated  amino  acid  is  underlined 


Gene 

Mutation3 

Tissue 

Cancer  incidence5 
(per  100k) 

Mutation 

frequency0 

MISd 

Peptide 

Presenting 

allele 

HLA 

frequency6 

EISf 

GSg 

osh 

KRAS 

G13D 

Large  intestine 

45.94 

5.42% 

2.49 

VWGAGDVGK 

A*1101 

7.07% 

5.272 

13.127 

506.52 

A*6801 

3.50% 

0.671 

1.670 

KLVWGAGDV 

A*0201 

13.66% 

5.349 

13.319 

GAGDVGKSAL 

B*3801 

1.03% 

0.221 

0.549 

VVGAGDVGK 

A*1101 

7.07% 

2.089 

5.201 

AGDVGKSAL 

B*0702 

6.71% 

3.307 

8.235 

B*3801 

1.03% 

0.227 

0.565 

B*3901 

0.87% 

0.120 

0.299 

DVGKSALTI 

B*5101 

4.29% 

1.057 

2.631 

G12V 

Biliary  tract 

1.14 

5.89% 

0.07 

VVGAVGVGK 

A*1101 

7.07% 

3.013 

0.202 

GAVGVGKSAL 

B*0702 

6.71% 

1.072 

0.072 

B*3801 

1.03% 

0.221 

0.015 

KLVWGAVGV 

A*0201 

13.66% 

8.102 

0.544 

AVGVGKSAL 

B*0702 

6.71% 

4.441 

0.298 

VWGAVGVGK 

A*1101 

7.07% 

7.070 

0.475 

A*6801 

3.50% 

0.751 

0.050 

YKLVWGAV 

A*0203 

0.90% 

0.256 

0.017 

B*3902 

0.11% 

0.017 

0.001 

LVWGAVGV 

A*0201 

13.66% 

2.732 

0.183 

A*0203 

0.90% 

0.138 

0.009 

Large  intestine 

45.94 

7.29% 

3.35 

VVGAVGVGK 

A*1101 

7.07% 

3.013 

10.089 

GAVGVGKSAL 

B*0702 

6.71% 

1.072 

3.589 

B*3801 

1.03% 

0.221 

0.738 

KLVWGAVGV 

A*0201 

13.66% 

8.102 

27.133 

AVGVGKSAL 

B*0702 

6.71% 

4.441 

14.873 

VWGAVGVGK 

A*1101 

7.07% 

7.070 

23.678 

A*6801 

3.50% 

0.751 

2.514 

YKLVWGAV 

A*0203 

0.90% 

0.256 

0.859 

B*3902 

0.11% 

0.017 

0.056 

LVWGAVGV 

A*0201 

13.66% 

2.732 

9.149 

A*0203 

0.90% 

0.138 

0.462 

Pancreas 

11.70 

17.67% 

2.07 

WGAVGVGK 

a*iioi 

7.07% 

3.013 

6.228 

GAVGVGKSAL 

B*0702 

6.71% 

1.072 

2.216 

B*3801 

1.03% 

0.221 

0.456 

KLVWGAVGV 

A*0201 

13.66% 

8.102 

16.749 

AVGVGKSAL 

B*0702 

6.71% 

4.441 

9.182 

VWGAVGVGK 

A*1101 

7.07% 

7.070 

14.616 

A*6801 

3.50% 

0.751 

1.552 

YKLVWGAV 

A*0203 

0.90% 

0.256 

0.530 

B*3902 

0.11% 

0.017 

0.035 

LVWGAVGV 

A*0201 

13.66% 

2.732 

5.648 

A*0203 

0.90% 

0.138 

0.285 

G12R 

Pancreas 

11.70 

7.17% 

0.84 

ARGVGKSAL 

B*0702 

6.71% 

1.608 

1.349 

B*2705 

1.38% 

0.331 

0.278 

B*3901 

0.87% 

0.239 

0.200 

GARGVGKSAL 

B*0702 

6.71% 

1.286 

1.079 

EYKLVWGAR 

A*3101 

2.40% 

0.600 

0.503 

KLVWGARGV 

A*0201 

13.66% 

3.633 

3.048 

WGARGVGK 

a*iioi 

7.07% 

3.013 

2.527 

RGVGKSALTI 

B*0702 

6.71% 

1.087 

0.912 

VWGARGVGK 

A*1101 

7.07% 

7.070 

5.931 

A*6801 

3.50% 

0.671 

0.563 

LVVVGARGV 

A*0203 

0.90% 

0.138 

0.116 

G12D 

Biliary  tract 

1.14 

15.43% 

0.18 

VVGADGVGK 

A*1101 

7.07% 

2.333 

0.410 

LVWGADGV 

A*0201 

13.66% 

1.683 

0.296 

A*0203 

0.90% 

0.127 

0.022 

KLVWGADGV 

A*0201 

13.66% 

6.713 

1.181 

GADGVGKSAL 

B*3801 

1.03% 

0.680 

0.120 

VWGADGVGK 

a*iioi 

7.07% 

5.272 

0.927 

A*6801 

3.50% 

0.671 

0.118 

Endometrium 

23.49 

5.67% 

1.33 

WGADGVGK 

a*iioi 

7.07% 

2.333 

3.108 

LVWGADGV 

A*0201 

13.66% 

1.683 

2.242 

A*0203 

0.90% 

0.127 

0.170 

KLVWGADGV 

A*0201 

13.66% 

6.713 

8.940 

GADGVGKSAL 

B*3801 

1.03% 

0.680 

0.906 

VWGADGVGK 

A*1101 

7.07% 

5.272 

7.022 

A*6801 

3.50% 

0.671 

0.893 

Large  intestine 

45.94 

11.32% 

5.20 

VVGADGVGK 

A*1101 

7.07% 

2.333 

12.134 

LVWGADGV 

A*0201 

13.66% 

1.683 

8.755 

A*0203 

0.90% 

0.127 

0.662 

KLVWGADGV 

A*0201 

13.66% 

6.713 

34.908 

GADGVGKSAL 

B*3801 

1.03% 

0.680 

3.539 

VWGADGVGK 

a*iioi 

7.07% 

5.272 

27.416 

A*6801 

3.50% 

0.671 

3.488 
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Table  1 

( continued ) 


Gene 

Mutation3 

Tissue 

Cancer  incidence5 
(per  100k) 

Mutation 

frequency0 

MISd 

Peptide 

Presenting 

allele 

HLA 

frequency6 

EISf 

GSg 

osh 

Ovary 

12.62 

5.47% 

0.69 

WGADGVGK 

A*1101 

7.07% 

2.333 

1.611 

LVWGADGV 

A*0201 

13.66% 

1.683 

1.162 

A*0203 

0.90% 

0.127 

0.088 

KLVWGADGV 

A*0201 

13.66% 

6.713 

4.634 

GADGVGKSAL 

B*3801 

1.03% 

0.680 

0.470 

VWGADGVGK 

A*1101 

7.07% 

5.272 

3.639 

A*6801 

3.50% 

0.671 

0.463 

Pancreas 

11.70 

28.57% 

3.34 

WGADGVGK 

A*1101 

7.07% 

2.333 

7.799 

LVWGADGV 

A*0201 

13.66% 

1.683 

5.627 

A*0203 

0.90% 

0.127 

0.426 

KLVWGADGV 

A*0201 

13.66% 

6.713 

22.438 

GADGVGKSAL 

B*3801 

1.03% 

0.680 

2.275 

VWGADGVGK 

A*1101 

7.07% 

5.272 

17.622 

A*6801 

3.50% 

0.671 

2.242 

Small  intestine 

1.92 

8.33% 

0.16 

WGADGVGK 

A*1101 

7.07% 

2.333 

0.373 

LVWGADGV 

A*0201 

13.66% 

1.683 

0.269 

A*0203 

0.90% 

0.127 

0.020 

KLVWGADGV 

A*0201 

13.66% 

6.713 

1.074 

GADGVGKSAL 

B*3801 

1.03% 

0.680 

0.109 

VWGADGVGK 

A*1 101 

7.07% 

5.272 

0.843 

A*6801 

3.50% 

0.671 

0.107 

G12C 

Lung 

60.66 

7.38% 

4.48 

ACGVGKSAL 

B*0702 

6.71% 

1.850 

8.283 

VWGACGVGK 

A*1101 

7.07% 

7.070 

31.650 

A*6801 

3.50% 

0.671 

3.003 

LVWGACGV 

A*0201 

13.66% 

2.245 

10.049 

A*0203 

0.90% 

0.151 

0.674 

VVGACGVGK 

A*1101 

7.07% 

2.333 

10.445 

GACGVGKSAL 

B*3801 

1.03% 

0.221 

0.987 

KLVWGACGV 

A*0201 

13.66% 

8.102 

36.269 

NPM1 

W288fs*12 

Hematopoietic  and 

46.19 

19.03% 

8.79 

DLCLAVEEV 

A*0201 

13.66% 

1.635 

14.368 

87.49 

lymphoid  tissue 

DQEAIQDLCL 

B*3801 

1.03% 

0.118 

1.041 

EAIQDLCLA 

A*6801 

3.50% 

0.250 

2.197 

B*4501 

2.35% 

0.261 

2.295 

AIQDLCLAV 

A*0201 

13.66% 

4.051 

35.607 

QEAIQDLCL 

B*4001 

3.26% 

0.924 

8.121 

B*4403 

5.12% 

0.711 

6.247 

QEAIQDLCLA 

B*4403 

5.12% 

2.004 

17.615 

HRAS 

G12V 

Urinary  tract 

20.27 

7.73% 

1.57 

VVGAVGVGK 

A*1101 

7.07% 

4.218 

6.608 

49.24 

GAVGVGKSAL 

B*0702 

6.71% 

1.102 

1.727 

B*3801 

1.03% 

0.250 

0.391 

KLVWGAVGV 

A*0201 

13.66% 

8.102 

12.694 

AVGVGKSAL 

B*0702 

6.71% 

6.709 

10.512 

VWGAVGVGK 

A*1101 

7.07% 

7.070 

11.078 

A*6801 

3.50% 

0.838 

1.314 

YKLVWGAV 

A*0203 

0.90% 

0.256 

0.402 

LVWGAVGV 

A*0201 

13.66% 

2.732 

4.281 

A*0203 

0.90% 

0.151 

0.236 

FGFR3 

Y373C 

Urinary  tract 

20.27 

8.95% 

1.81 

DEAGSVCAG 

B*4403 

5.12% 

0.188 

0.341 

42.06 

DEAGSVCAGI 

B*4402 

2.85% 

0.214 

0.388 

B*4403 

5.12% 

0.557 

1.010 

SVCAGILSY 

A*1101 

7.07% 

0.307 

0.557 

B*1501 

2.64% 

1.749 

3.173 

CAGILSYGV 

B*5101 

4.29% 

0.144 

0.261 

GSVCAG1LSY 

A*1101 

7.07% 

0.264 

0.479 

EAGSVCAGI 

B*5101 

4.29% 

0.300 

0.544 

S249C 

Urinary  tract 

20.27 

29.97% 

6.07 

ERCPHRPIL 

B*2705 

1.38% 

0.064 

0.389 

B*3901 

0.87% 

0.054 

0.326 

LERCPHRPI 

B*4403 

5.12% 

0.136 

0.828 

TYTLDVLERC 

A*2402 

9.29% 

0.247 

1.502 

DVLERCPHR 

A*1 101 

7.07% 

0.224 

1.359 

A*3101 

2.40% 

0.115 

0.702 

A*6801 

3.50% 

0.376 

2.281 

CPHRP1LQA 

B*0702 

6.71% 

0.311 

1.889 

R248C 

Skin 

19.79 

13.07% 

2.59 

TYTLDVLECS 

A*2402 

9.29% 

0.247 

0.639 

DVLECSPHR 

Anioi 

7.07% 

0.224 

0.579 

A*3101 

2.40% 

0.095 

0.245 

A*6801 

3.50% 

0.312 

0.808 

LECSPHRPI 

B*4403 

5.12% 

0.190 

0.492 

G697C 

Upper  aerodigestive 

13.51 

18.80% 

2.54 

CIPVEELFK 

A*1101 

7.07% 

0.408 

1.035 

tract 

LGGSPYPCI 

B*5101 

4.29% 

0.163 

0.413 

YPCIPVEELF 

A*2402 

9.29% 

0.266 

0.674 

B*3501 

5.82% 

0.916 

2.327 

PCIPVEELF 

A*2402 

9.29% 

0.492 

1.250 

TLGGSPYPCI 

A*0201 

13.66% 

0.455 

1.156 
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Table  1 

( continued ) 


Gene 

Mutation3 

Tissue 

Cancer  incidence5 
(per  100k) 

Mutation 

frequency0 

MISd 

Peptide 

Presenting 

allele 

HLA 

frequency6 

EISf 

GSg 

osh 

CIPVEELFKL 

A*0201 

13.66% 

0.963 

2.446 

B*3801 

1.03% 

0.030 

0.075 

PYPCIPVEE 

A*2402 

9.29% 

0.430 

1.092 

PYPCIPVEEL 

A*2402 

9.29% 

3.637 

9.238 

YPCIPVEEL 

B*0702 

6.71% 

0.793 

2.013 

B*3501 

5.82% 

0.404 

1.026 

B*5101 

4.29% 

0.206 

0.523 

MSH6 

P1087fs*5 

Large  intestine 

45.94 

6.07% 

2.79 

LPEDTPPLL 

B*0702 

6.71% 

0.876 

2.442 

32.21 

B*3501 

5.82% 

0.260 

0.724 

B*3801 

1.03% 

0.053 

0.148 

B*5101 

4.29% 

0.443 

1.236 

ILLPEDTPPL 

A*0201 

13.66% 

4.475 

12.479 

LLPEDTPPLL 

A*0201 

13.66% 

1.631 

4.548 

A*2402 

9.29% 

0.878 

2.449 

B*3801 

1.03% 

0.026 

0.071 

LLPEDTPPL 

A*0201 

13.66% 

2.910 

8.115 

TP53 

R273H 

Large  intestine 

45.94 

5.31% 

2.44 

GRNSFEVHV 

B*2705 

1.38% 

0.108 

0.264 

26.52 

HVCACPGRDR 

A*3101 

2.40% 

0.209 

0.510 

A*6801 

3.50% 

0.322 

0.785 

EVHVCACPGR 

A*1101 

7.07% 

0.610 

1.489 

A*6801 

3.50% 

1.725 

4.208 

GRNSFEVHVC 

B*2705 

1.38% 

0.119 

0.291 

R248Q 

Hematopoietic  and 

46.19 

7.69% 

3.55 

QRPILTIITL 

A*2402 

9.29% 

0.587 

2.084 

lymphoid  tissue 

MGGMNQRPIL 

B*5101 

4.29% 

0.352 

1.251 

SCMGGMNQR 

A*3101 

2.40% 

0.269 

0.954 

MGGMNQRPI 

B*5101 

4.29% 

0.778 

2.765 

SSCMGGMNQR 

A*1101 

7.07% 

0.859 

3.052 

A*6801 

3.50% 

0.301 

1.068 

GMNQRPILTI 

A*0201 

13.66% 

2.196 

7.800 

EGFR 

L858R 

Lung 

60.66 

9.60% 

5.82 

ITDFGRAKLL 

B*3801 

1.03% 

0.054 

0.312 

21.28 

HVKITDFGR 

a*iioi 

7.07% 

0.252 

1.470 

A*3101 

2.40% 

1.200 

6.988 

A*6801 

3.50% 

0.084 

0.491 

KITDFGRAK 

A*1101 

7.07% 

0.249 

1.449 

RAKLLGAEEK 

A*1101 

7.07% 

0.447 

2.600 

KITDFGRAKL 

A*0201 

13.66% 

0.594 

3.461 

GRAKLLGAE 

B*2705 

1.38% 

0.026 

0.153 

ITDFGRAKL 

B*3801 

1.03% 

0.062 

0.359 

E746_A750del 

Lung 

60.66 

5.86% 

3.55 

IPVAIKTSPK 

A*1101 

7.07% 

0.240 

0.854 

PVAIKTSPK 

A*1101 

7.07% 

0.287 

1.020 

AIKTSPKANK 

A*1101 

7.07% 

0.418 

1.485 

TSPKANKEI 

A*2402 

9.29% 

0.180 

0.641 

P1K3CA 

H1047R 

Breast 

120.81 

12.51% 

15.11 

FMKQMNDAR 

A*3101 

2.40% 

0.085 

1.279 

18.84 

YFMKQMNDAR 

A*3101 

2.40% 

0.106 

1.603 

ARHGGWTTK 

B*2705 

1.38% 

0.336 

5.073 

ARHGGWTTKM 

B*2705 

1.38% 

0.037 

0.557 

Endometrium 

23.49 

5.59% 

1.31 

FMKQMNDAR 

A*3101 

2.40% 

0.085 

0.111 

YFMKQMNDAR 

A*3101 

2.40% 

0.106 

0.139 

ARHGGWTTK 

B*2705 

1.38% 

0.336 

0.441 

ARHGGWTTKM 

B*2705 

1.38% 

0.037 

0.048 

E545K 

Breast 

120.81 

5.39% 

6.51 

SEITKQEKD 

B*4403 

5.12% 

0.112 

0.731 

SEITKQEKDF 

B*4402 

2.85% 

0.374 

2.436 

B*4403 

5.12% 

0.332 

2.163 

LSEITKQEK 

A*1101 

7.07% 

0.191 

1.241 

PLSEITKQEK 

A*1101 

7.07% 

0.148 

0.961 

Cervix 

7.98 

5.04% 

0.40 

SEITKQEKD 

B*4403 

5.12% 

0.112 

0.045 

SEITKQEKDF 

B*4402 

2.85% 

0.374 

0.150 

B*4403 

5.12% 

0.332 

0.134 

LSEITKQEK 

A*1101 

7.07% 

0.191 

0.077 

PLSEITKQEK 

Amoi 

7.07% 

0.148 

0.059 

Urinary  tract 

20.27 

6.79% 

1.38 

SEITKQEKD 

B*4403 

5.12% 

0.112 

0.155 

SEITKQEKDF 

B*4402 

2.85% 

0.374 

0.515 

B*4403 

5.12% 

0.332 

0.457 

LSEITKQEK 

A*1101 

7.07% 

0.191 

0.262 

PLSEITKQEK 

A*1101 

7.07% 

0.148 

0.203 

ABL1 

T315I 

Hematopoietic  and 

46.19 

5.49% 

2.54 

FYIIIEFMTY 

A*2402 

9.29% 

0.221 

0.561 

9.80 

lymphoid  tissue 

IIIEFMTYG 

A*0201 

13.66% 

0.329 

0.835 

YIIIEFMTY 

B*1501 

2.64% 

0.257 

0.652 

B*3501 

5.82% 

0.148 

0.374 

IEFMTYGNL 

B*4001 

3.26% 

0.780 

1.979 

B*4403 

5.12% 

0.121 

0.307 

REPPFYIII 

A*2402 

9.29% 

0.195 

0.495 

B*4001 

3.26% 

0.202 

0.512 

B*4403 

5.12% 

0.193 

0.490 
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Table  1 

( continued ) 


Gene 

Mutation3 

Tissue 

Cancer  incidence5 
(per  100k) 

Mutation 

frequency0 

MISd 

Peptide 

Presenting 

allele 

HLA 

frequency6 

EISf 

GSe 

osh 

1EFMTYGNLL 

B*4402 

2.85% 

0.412 

1.044 

B*4403 

5.12% 

0.120 

0.305 

IIEFMTYGNL 

B*3801 

1.03% 

0.030 

0.077 

TREPPFYIII 

B*2705 

1.38% 

0.042 

0.107 

EPPFYIIIEF 

A*2402 

9.29% 

0.371 

0.941 

B*3501 

5.82% 

0.339 

0.859 

B*3801 

1.03% 

0.028 

0.071 

B*4402 

2.85% 

0.073 

0.185 

CTNNB1 

T41A 

Soft  tissue 

3.03 

15.02% 

0.46 

IHSGATATA 

A*0203 

0.90% 

0.039 

0.018 

8.36 

B*3801 

1.03% 

0.084 

0.038 

TATAPSLSGK 

A*1101 

7.07% 

0.326 

0.148 

A*6801 

3.50% 

0.236 

0.108 

ATAPSLSGK 

A*1101 

7.07% 

3.535 

1.609 

A*3101 

2.40% 

0.069 

0.031 

A*6801 

3.50% 

0.493 

0.224 

GATATAPSL 

B*5101 

4.29% 

0.143 

0.065 

GIHSGATATA 

A*0203 

0.90% 

0.066 

0.030 

ATATAPSLSG 

a*iioi 

7.07% 

0.213 

0.097 

S45del 

Kidney 

14.12 

5.15% 

0.73 

SGATTTAPL 

B*0702 

6.71% 

0.257 

0.187 

ATTTAPLSGK 

A*1 101 

7.07% 

0.935 

0.680 

A*6801 

3.50% 

0.200 

0.146 

TTTAPLSGK 

A*1101 

7.07% 

2.621 

1.906 

A*6801 

3.50% 

0.690 

0.502 

APLSGKGNPE 

B*0702 

6.71% 

0.239 

0.174 

HSGATTTAPL 

B*0702 

6.71% 

0.276 

0.201 

S45F 

Soft  tissue 

3.03 

9.86% 

0.30 

GATTTAPFL 

B*5101 

4.29% 

0.143 

0.043 

TTAPFLSGK 

A*1101 

7.07% 

3.535 

1.056 

A*3101 

2.40% 

0.100 

0.030 

A*6801 

3.50% 

0.965 

0.288 

TTTAPFLSGK 

a*iioi 

7.07% 

0.713 

0.213 

A*6801 

3.50% 

0.466 

0.139 

TTTAPFLSG 

a*iioi 

7.07% 

0.183 

0.055 

HSGATTTAPF 

B*3501 

5.82% 

0.520 

0.155 

S37A 

Small  intestine 

1.92 

7.77% 

0.15 

SYLDSGIHAG 

A*2402 

9.29% 

0.398 

0.059 

SYLDSGIHA 

A*2402 

9.29% 

0.422 

0.063 

YLDSGIHAGA 

A*0203 

0.90% 

0.115 

0.017 

GIHAGATTTA 

A*0203 

0.90% 

0.038 

0.006 

AGATTTAPSL 

B*0702 

6.71% 

0.362 

0.054 

IHAGATTTA 

A*0203 

0.90% 

0.039 

0.006 

B*3801 

1.03% 

0.084 

0.013 

KIT 

D816V 

Hematopoietic  and 

46.19 

9.59% 

4.43 

RVIKNDSNY 

B*1501 

2.64% 

0.372 

1.648 

8.35 

lymphoid  tissue 

K1CDFGLARV 

A*0201 

13.66% 

1.392 

6.165 

A*3101 

2.40% 

0.058 

0.257 

ARVIKNDSN 

B*2705 

1.38% 

0.062 

0.275 

HNF1A 

P291fs*51 

Large  intestine 

45.94 

10.94% 

5.03 

PPgGQARDL 

B*0702 

6.71% 

1.072 

5.386 

7.97 

GPPgGQARDL 

B*5101 

4.29% 

0.514 

2.581 

JAK2 

V617F 

Hematopoietic  and 

46.19 

45.12% 

20.84 

CFCGDENIL 

A*2402 

9.29% 

0.189 

3.936 

5.96 

lymphoid  tissue 

LVLNYGVCF 

B*1501 

2.64% 

0.097 

2.022 

BRAF 

V600E 

Biliary  tract 

1.14 

10.06% 

0.11 

GDFGLATEK 

A*1101 

7.07% 

0.286 

0.033 

5.16 

Eye 

0.77 

11.35% 

0.09 

GDFGLATEK 

A*1101 

7.07% 

0.286 

0.025 

Large  intestine 

45.94 

12.78% 

5.87 

GDFGLATEK 

A*1101 

7.07% 

0.286 

1.681 

Ovary 

12.62 

8.70% 

1.10 

GDFGLATEK 

A*1101 

7.07% 

0.286 

0.314 

Skin 

19.79 

33.51% 

6.63 

GDFGLATEK 

A*1101 

7.07% 

0.286 

1.899 

Thyroid 

10.60 

39.71% 

4.21 

GDFGLATEK 

A*1101 

7.07% 

0.286 

1.205 

IDH1 

R132H 

Central  nervous 

6.09 

30.77% 

1.87 

KPIIIGHHA 

B*0702 

6.71% 

1.025 

1.921 

4.24 

system 

KPIIIGHHAY 

B*3501 

5.82% 

0.903 

1.693 

PIIIGHHAY 

B*1501 

2.64% 

0.335 

0.628 

NRAS 

Q61R 

Skin 

19.79 

7.12% 

1.41 

LLDILDTAGR 

A*6801 

3.50% 

0.527 

0.743 

2.72 

GREEYSAMR 

B*2705 

1.38% 

0.448 

0.631 

NRAS 

Q61K 

Skin 

19.79 

6.84% 

1.35 

LLDILDTAGK 

a*iioi 

7.07% 

0.996 

1.348 

RET 

M918T 

Thyroid 

10.60 

28.38% 

3.01 

VKWTAIESL 

B*3902 

0.11% 

0.042 

0.126 

2.51 

VKWTAIESLF 

A*2402 

9.29% 

0.403 

1.213 

GRIPVKWTA 

B*2705 

1.38% 

0.036 

0.108 

GRIPVKWTAI 

B*2705 

1.38% 

0.050 

0.151 

KWTAIESLF 

A*2402 

9.29% 

0.304 

0.915 

PDGFRA 

D842V 

Soft  tissue 

3.03 

7.00% 

0.21 

ARVIMHDSN 

B*2705 

1.38% 

0.060 

0.013 

0.65 

RVIMHDSNY 

A*1101 

7.07% 

0.144 

0.031 

B*1501 

2.64% 

0.575 

0.122 

KICDFGLARV 

A*0201 

13.66% 

1.212 

0.257 

VIMHDSNYV 

A*0201 

13.66% 

0.500 

0.106 

RVIMHDSNYV 

A*0201 

13.66% 

0.553 

0.117 

MET 

Y1253D 

Upper  aerodigestive 

13.51 

8.81% 

1.19 

DMYDKEYDSV 

A*0201 

13.66% 

0.297 

0.353 

0.58 

tract 

KEYDSVHNK 

a*iioi 

7.07% 

0.156 

0.186 

B*2705 

1.38% 

0.032 

0.039 
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Table  1 

( continued ) 


aHGVS  notations  for  describing  mutations  is  a  system  produced  by  the  Human  Genome  Variation  Society  (http://www.hgvs.org/). 
bCancer  incidence  for  US  population  from  http://seer.cancer.gov. 

cMutation  frequency  calculated  from  the  Catalogue  of  Somatic  Mutations  in  Cancer  (COSMIC:  Forbes  et  al.  2008). 
dMlS  (Mutation  Impact  Score)  =  Frequency  of  mutation  in  given  cancer  x  cancer  incidence. 

'Average  HLA  allele  frequency  calculated  from  www.allelefrequencies.net  for  all  US  ethnic  groups  with  n  >=  500. 
fEIS  (Epitope  Impact  Score)  =  Epitope  prediction  score  x  average  frequency  of  presenting  HLA  allele. 

SGS  (Global  Score)  =  MIS  x  E1S. 
hOS  (Overall  Score)  =  2  GS. 


score  is  beneficial,  because  epitopes  from  distributed  mutations  are 
likely  to  be  inferior  to  those  derived  from  hotspot  mutations.  This  is 
because  unlike  hotspot  mutations,  which  are  typically  early  occur¬ 
ring,  gain  of  function  mutations  essential  for  tumor  viability,  dis¬ 
tributed  mutations  are  usually  loss  of  function  mutations  that  arise 
later  in  tumorigenesis.  Loss  of  function  mutations  are  undesirable 
immunologic  targets  because  genes  mutated  in  this  manner  be¬ 
come  nonessential  to  the  cell,  and  immune  evasion  can  result  from 
downregulation  of  mutant  epitope  presentation.  Mutations  in¬ 
curred  late  in  the  tumorigenic  process  are  generally  undesirable 
immunologic  targets  for  the  added  reason  that  well-developed 
tumors  are  immunoresistant,  and  the  opportunity  for  prevention 
has  passed. 

To  be  useful  as  a  vaccine  epitope,  a  mutation  must  be  immuno¬ 
genic.  That  is,  it  must  be  contained  within  a  peptide  that  is  pre¬ 
sented  at  the  cell  surface  by  HLA  class  1  molecules  and,  ultimately, 
be  sufficiently  distinct  from  the  wild-type  version  of  the  same 
peptide  to  cause  a  T-cell  response.  Here,  we  used  computational 
epitope  prediction  methods  to  identify  common  tumor  mutations 
that  may  also  be  immunogenic.  It  is  important  to  note  that  this 
approach  predicts  only  peptide-HLA  binding,  which  is  taken  as  a 
correlate  of  immunogenicity.  Whether  a  given  peptide  is  truly 
immunogenic  also  depends  on  it  being  appropriately  processed 
and  loaded  on  HLA  molecules,  and  being  sufficiently  distinct  from 
the  presumably  tolerated,  nonmutated  version.  In  the  present 
study,  we  estimate  the  proportion  of  tumor  mutations  that  could  be 
immunogenic,  and  should  be  prioritized  for  further  investigation. 
At  present,  the  cost  and  throughput  of  laboratory  immunoassays 
are  prohibitive  for  screening  large  numbers  of  candidate  mutations 
and  a  computational  approach  that  shortlists  candidates  is  war¬ 
ranted.  For  in  silico  epitope  prediction  tools,  false-positive  predic¬ 
tions  are  known  to  occur  [39,41].  To  curb  this  problem,  only  pep¬ 
tides  predicted  by  two  or  more  tools  and  scoring  within  the  top 
ranking  5%  are  considered  here. 

Taking  advantage  of  three  well-established  in  silico  epitope  pre¬ 
diction  programs,  we  queried  in  bulk  every  mutation  or  frameshift- 
peptide  from  candidate  cancer  genes  identified,  as  described  above, 
using  the  COSMIC  database.  Again,  only  peptides  ranking  within 
the  top  5%  of  all  possible  peptide  degradation  products  from  a  given 
gene,  and  predicted  independently  by  at  least  two  of  the  three 
programs  were  retained  for  analysis.  These  predictions,  in  conjunc¬ 
tion  with  HLA  class  1  allele  frequencies  for  the  US  population 
(http://www.allelefrequencies.org)  were  used  to  produce  an  E1S  for 
each  mutant  peptide  (see  Methods). 


We  screened  all  possible  tiled  peptides  derived  from  the  20 
genes  and  36  mutations  identified  against  all  HLA  class  1  variants 
represented  by  the  three  in  silico  epitope  prediction  programs,  for 
54,432  individual  queries.  For  each  gene,  and  35  of  the  36  muta¬ 
tions,  we  obtained  one  or  more  epitopes  that  passed  filtering.  There 
were  229  total  and  159  unique  peptides  predicted  to  be  presented 
by  up  to  four  distinct  HLA  class  I  alleles.  Interestingly,  of  the  10 
peptides  with  the  highest  EIS,  seven  are  predicted  to  arise  from  the 
KRAS  oncogene.  Six  candidates  are  position  G12  mutants  (two  G12V, 
two  G12C,  one  G12R,  and  one  G12D)  and  one  from  G13D.  Most  KRAS 
mutational  peptides  were  predicted  to  be  high-ranking  HLA  binders 
by  some  of  the  most  frequently  represented  alleles  in  the  US  popula¬ 
tion,  which  is  an  important  factor  in  evaluating  their  EIS. 

Finally,  we  ranked  the  20  cancer  genes  according  to  an  OS, 
which  took  into  account  all  of  our  criteria,  including  the  frequency 
of  mutations  within  each  gene  in  specific  tumors,  the  prevalence  of 
those  tumors,  the  likelihood  of  mutations  to  be  presented  as  HLA 
class  I-restricted  epitopes,  and  the  population  frequency  of  HLA 
alleles.  The  top-ranking  gene  was  KRAS  (OS  =506.52),  followed  by 
NPM1  (OS  =87.49)  and  HRAS  (OS  =49.24).  The  lowest  ranked  of  the 
20  genes  was  APC,  with  an  OS  of  just  0.14.  KRAS  figures  prominently 
because  there  are  frequent  hotspot  mutations  in  numerous,  preva¬ 
lent  cancers,  and  mutant  peptides  are  predicted  to  be  presented  by 
several  common  HLA  alleles  (Table  1 ).  Relative  to  KRAS,  APC  muta¬ 
tions  occur  infrequently  in  rare  cancers,  and  are  predicted  to  be 
presented  weakly  by  rare  HLA  alleles.  Our  data  illustrate  other 
interesting  features  of  predicted  tumor  epitopes.  For  example,  we 
observe  that  a  single  peptide,  KICDFGLARV,  is  shared  by  PDGFRA 
D842V  and  KIT  D816V  and,  therefore,  constitutes  an  attractive 
candidate  epitope  for  simultaneous  targeting  of  hematopoietic/ 
lymphoid  (KIT)  and  soft-tissue  tumors  (PDGFRA). 

4.  Discussion 

The  predicted  epitopes  presented  here  are  almost  certainly  an 
underestimate  of  those  that  actually  occur  in  cancer.  There  are 
several  reasons  for  this.  First,  we  have  applied  stringent  filters  for 
mutation  frequency  and  epitope  prediction  scores  that  may  have 
excluded  many  real  epitopes.  Second,  our  results  are  limited  by  the 
fact  that  epitope  prediction  tools  support  binding  models  for  only  a 
fraction  of  several  hundred  known  HLA  class  I  coding  variants.  For 
instance,  BIMAS,  SYFPEITHI,  and  IEDB  support  18,  22,  and  41  HLA 
class  I  coding  variants,  respectively.  Only  nine  HLA  class  I  coding 
variants  are  shared  by  at  least  two  of  the  three  epitope  prediction 
tools  and  only  four  variants  are  shared  by  all  three.  Considering  the 
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19  alleles  supported  by  at  least  two  programs,  and  frequencies  of 
these  alleles  in  the  US  population,  we  calculate  that  approximately 
55%  of  the  total  US  population  would  have  at  least  one  of  these 
alleles,  and  our  study  is  essentially  blind  to  the  remaining  45%.  This 
is  an  inherent  limitation  of  existing  epitope  prediction  tools,  and 
affects  all  epitope  prediction  studies.  Finally,  it  is  important  to 
consider  that  mutation  detection  strategies  have,  to  date,  relied 
heavily  on  capillary  sequencing  of  polymerase  chain  reaction- 
amplified  coding  sequence.  This  is  a  relatively  insensitive  approach 
where  mutations  can  be  obscured  by  coamplification  of  wild-type 
sequence  from  stromal  cells.  Single-molecule  sequencing  methods 
produce  a  digital  readout  of  allele  sequences  and  promise  greater 
sensitivity  [42],  Thus,  with  time,  the  number  of  candidate  tumor 
mutational  epitopes  will  increase.  Mutations  now  thought  to  be 
rare  may  be  shown  to  be  present  at  more  significant  frequencies 
and  new  mutations  are  likely  to  be  discovered  by  ongoing  cancer 
genome  screening  efforts. 

While  the  concept  of  prophylactic  cancer  vaccination  has  been 
discussed  for  some  time,  there  have  been  no  previous  attempts  to 
estimate  the  potential  utility  of  such  vaccines,  in  terms  of  the 
proportion  of  individuals  that  may  benefit.  As  discussed  above,  the 
key  factors  to  consider  are  tumor  incidence,  mutation  frequency, 
and  HLA  restriction.  HLA  restriction  rarely  receives  adequate  atten¬ 
tion,  but  is  a  critical  issue  in  vaccine  development,  given  the  exten¬ 
sive  allelic  variation  at  this  locus  and  the  fact  that  efficient  presen¬ 
tation  of  a  given  epitope  is  limited  to  a  subset  of  HLA  alleles.  Here, 
for  demonstration,  we  have  estimated  the  proportion  of  the  US 
population  that  could  benefit  from  hypothetical  multivalent  vac¬ 
cines  targeting  the  relatively  common  colorectal  and  hematologi¬ 
cal/lymphoid  cancers.  Specifically,  we  determined  the  proportion 
of  the  US  population  that  carries  a  presenting  HLA  allele  for  at  least 
one  of  the  mutant  peptides  associated  with  the  tumor  in  question 
(Table  1).  By  this  approach  we  estimate  that  hypothetically,  16.8% 
of  those  who  would  otherwise  develop  a  hematopoietic  or  lym¬ 
phoid  cancer  could  be  protected  by  vaccination  with  mutational 
epitopes  predicted  here,  and  of  those  who  would  otherwise  de¬ 
velop  colorectal  cancer,  11.5%  could  be  protected.  Interestingly, 
given  the  prevalence  of  KRAS  mutations  in  both  pancreatic  and 
colorectal  cancer,  a  multivalent  vaccine  targeting  colorectal  cancer 
could  also  protect  1 2%  of  individuals  who  would  otherwise  develop 
pancreatic  cancer. 

The  results  presented  here  are  purely  for  demonstration,  but 
suggest  meaningful  levels  of  cancer  prophylaxis  could  be  achiev¬ 
able  by  vaccination.  The  results  are  limited  by  the  fact  that  that  only 
about  half  of  the  US  population  is  estimated  to  carry  at  least  one  of 
the  alleles  considered  by  the  epitope  prediction  tools  employed. 
This  taken  with  our  incomplete  knowledge  of  recurrent  tumor 
mutations  means  that  we  are  likely  underestimating  significantly 
the  population  reach  of  hypothetical  vaccine  constructs.  Our  pro¬ 
jections  are  also  based  on  the  premise  that  predicted  epitopes  are 
immunogenic,  and  that  immunogenic  responses  would  be  protec¬ 
tive,  both  of  which  require  experimental  validation.  There  is  little 
certainty  as  to  which  predicted  epitopes  correspond  to  high-value 
early  driver  mutations,  and  some  may  be  from  later  onset  muta¬ 
tions  that  occur  in  established  tumors,  and  therefore  have  limited 
utility  for  immunoprevention.  Finally  it  is,  of  course,  not  possible  to 
predict  who  will  or  will  not  develop  a  spontaneous  tumor;  thus,  our 
estimates  are  based  on  vaccination  of  the  entire  population.  In 
principle,  one  could  restrict  vaccination  to  individuals  carrying  HLA 
alleles  known  to  present  epitopes  contained  in  a  vaccine  construct, 
but  the  extra  efforts  required  for  HLA  typing  would  detract  from  the 
utility  of  population  cancer  control.  A  more  feasible  strategy  for 
targeted  vaccination  is  in  association  with  a  cancer  screening  pro¬ 
gram,  for  example,  colonoscopy  for  detection  of  colorectal  tumors. 
Individuals  with  benign  adenomatous  polyps,  identified  by 
colonoscopy,  are  at  risk  of  progression  to  adenocarcinoma.  Progres¬ 


sion  from  adenoma  to  adenocarcinoma  is  typically  mediated  by 
mutations  in  KRAS,  BRAF,  and  other  oncogenes,  as  well  as  loss  of 
function  mutations  in  TP53  and  other  tumor  suppressors  [43]. 
Thus,  a  progression  blocking  vaccine  based  on  KRAS  and  related 
epitopes  could  be  an  effective  preventative  strategy  for  this  tumor 
site. 
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